MiX Knowledge

基于云和物联网的智能代理驱动的人类步态模拟，用于检测肌肉疾病

分类： 人机交互, 多代理系统

作者： Sina Saadati, Mohammadreza Razzazi

发布时间： 2024-08-30

链接： http://arxiv.org/abs/2409.14561v1

摘要： 运动障碍是一个重大的全球健康问题，通常通过药物治疗来治疗，但可能会导致不良的长期影响。目前的治疗策略缺乏区分患者健康和不健康的肌肉，因此需要有针对性的方法来区分肌肉组织。目前还没有用于此目的的运动分析器应用程序。此外，运动分析软件存在很大差距，因为一些研究优先考虑仿真，忽略软件需求，而另一些研究则专注于计算方面，忽略仿真的细微差别。我们引入了全面的五阶段方法来分析步态期间下半身的神经肌肉系统。第一阶段采用基于物联网的创新方法来捕获运动信号。第二和第三阶段涉及代理驱动的下半身骨骼生物力学模型和人体随意肌模型。因此，使用代理驱动的方法，运动捕捉信号可以转换为神经刺激。然后在第四步中通过我们提出的集成神经网络框架分析模拟结果，以检测每个关节的异常运动。最后，结果通过用户友好的图形界面显示，提高了该方法的可用性。利用开发的应用程序，我们模拟了一些患者在步态周期中的神经肌肉骨骼系统，从而通过基于关节的分析对健康和病理肌肉活动进行分类。本研究利用云计算创建一个可在全球范围内访问的独立于基础设施的应用程序。所提出的应用程序使专家能够通过模拟患者的步态来区分患者的健康和不健康的肌肉。

识别和聚类 PvP 游戏中团队组成的对抗关系以进行有效的平衡分析

分类： 人工智能, 计算机科学与博弈论, 信息检索, 机器学习, 多代理系统

作者： Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu

发布时间： 2024-08-30

链接： http://arxiv.org/abs/2408.17180v1

摘要： 如何量化游戏设置中的平衡性？这个问题对于游戏设计师来说至关重要，尤其是在玩家对玩家 (PvP) 游戏中，分析预定义团队组合之间的强度关系（例如多人在线竞技场 (MOBA) 游戏中的英雄组合或纸牌游戏中的牌组）是非常重要的。对于增强游戏玩法和实现平衡至关重要。我们开发了两种先进的衡量标准，超越了简单化的胜率，量化了零和竞争场景中的平衡。这些度量源自获胜值估计，该估计采用通过 Bradley-Terry 模型的强度评级近似和通过矢量量化的反关系近似，从而显着降低了与传统获胜值估计相关的计算复杂性。在这些模型的学习过程中，我们识别出有用的组合类别并查明它们的对应关系，与人类玩家的经验保持一致，而不需要特定的游戏知识。我们的方法取决于一种简单的技术，通过针对极小的状态空间的确定性矢量量化过程来增强离散表示中的码本利用率。我们的框架已在流行的在线游戏中得到验证，包括《帝国时代 II》、《炉石传说》、《荒野乱斗》和《英雄联盟》。在这些游戏中观察到的强度关系的准确性与传统的成对获胜值预测相当，同时还提供了更易于管理的分析复杂性。最终，我们的研究结果有助于更深入地了解 PvP 游戏动态，并提出一种显着改进游戏平衡评估和设计的方法。

使用 TDOA 测量进行 3D 源定位的粒子流

分类： 信号处理, 多代理系统, 系统与控制, 系统与控制

作者： Wenyu Zhang, Mohammad Javad Khojasteh, Florian Meyer

发布时间： 2024-08-30

链接： http://arxiv.org/abs/2408.17096v1

摘要： 使用到达时间差 (TDOA) 进行定位具有多种应用，例如在被动监视系统和海洋哺乳动物研究中。在本文中，我们提出了一种贝叶斯估计方法，该方法可以基于 TDOA 测量来定位未知数量的 3D 静态源。所提出的基于粒子流（PFL）的定位算法可以克服与高度非线性TDOA测量模型、数据关联（DA）不确定性以及要定位的源数量的不确定性相关的挑战。在具有挑战性的多传感器源定位问题中，在统一置信传播 (BP) 框架内比较不同的 PFL 策略。特别地，我们考虑基于一个或多个高斯核的基于 PFL 的信念近似，其参数使用确定性和随机流过程计算。我们的数值结果表明，所提出的方法可以正确确定源的数量并提供准确的位置估计。当使用相同数量的粒子时，随机流比确定性流表现出更高的准确性。

MAPF-GPT：大规模多智能体寻路的模仿学习

分类： 多代理系统, 人工智能, 机器学习

作者： Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

发布时间： 2024-08-29

链接： http://arxiv.org/abs/2409.00134v1

摘要： 多智能体寻路（MAPF）是一个具有挑战性的计算问题，通常需要为共享环境中的多个智能体找到无碰撞路径。最佳地解决 MAPF 是 NP 难题，但高效的解决方案对于许多应用（包括自动化仓库和运输系统）至关重要。最近，基于学习的 MAPF 方法引起了人们的关注，特别是那些利用深度强化学习的方法。遵循当前机器学习的趋势，我们为 MAPF 问题创建了一个名为 MAPF-GPT 的基础模型。使用模仿学习，我们在一组预先收集的次优专家轨迹上训练了一种策略，这些轨迹可以在部分可观察性的条件下生成动作，而无需额外的启发式、奖励函数或与其他代理的通信。生成的 MAPF-GPT 模型在解决训练数据集中不存在的 MAPF 问题实例时展示了零样本学习能力。我们表明，MAPF-GPT 在各种问题实例上明显优于当前性能最佳的可学习 MAPF 求解器，并且在计算方面（在推理模式下）非常高效。

迭代图对齐

分类： 机器学习, 人工智能, 计算和语言, 多代理系统

作者： Fangyuan Yu, Hardeep Singh Arora, Matt Johnson

发布时间： 2024-08-29

链接： http://arxiv.org/abs/2408.16667v1

摘要： 通过压缩不同的叙述，大语言模型超越了记忆，通过捕捉普遍的因果关系来获得智慧。然而，由于训练数据多样性不足，它们面临局部“代表性差距”，限制了它们在现实世界中的效用，特别是在需要严格遵守规则的任务中。依赖大量人工注释的传统比对方法效率低下且不可扩展。最近的自我调整技术也存在不足，因为它们通常依赖于基于自我选择的提示和基于记忆的学习。为了解决这些问题，我们引入了迭代图对齐（IGA），这是一种无注释的基于规则的对齐算法。教师模型 (VLM) 采用迭代图提示 (IGP) 来创建逻辑图和参考答案。学生模型 (LLM) 通过尝试将其响应与这些参考文献保持一致，并与辅助模型协作生成不同的答案，从而确定本地知识差距。然后将这些对齐的响应用于迭代监督微调（SFT）。我们对五个基于规则的场景的评估证明了 IGP 的有效性，Claude Sonnet 3.5 的对齐改进为 73.12%，而 Llama3-8B-Instruct 的对齐改进为 86.20%，在基于规则的对齐方面优于 Claude Sonnet 3.5。

使用原始代理、双重代理和近端代理进行共识规划

分类： 优化与控制, 多代理系统

作者： Alvaro Maggiar, Lee Dicker, Michael Mahoney

发布时间： 2024-08-29

链接： http://arxiv.org/abs/2408.16462v1

摘要： 共识规划是一种跨复杂系统和组织（包括复杂的供应链优化管道）协调决策的方法。当大型相互依赖的分布式代理（系统）共享公共资源并且必须采取行动以实现共同目标时，就会出现这种情况。在本文中，我们引入了一种通用的共识规划协议（CPP）来解决此类问题。我们的协议允许不同的代理以不同的方式与协调算法交互（例如，作为原始代理、双重代理或近端代理）。在之前的共识规划工作中，所有智能体都被假设具有相同的交互模式（例如，所有双重智能体或所有原始智能体或所有近端智能体），最常见的是使用乘数交替方向法（ADMM）作为近端智能体。然而，这在实践中通常不是一个有效的假设，因为代理由大型复杂系统组成，而我们可能没有能力随意修改这些大型复杂系统。我们的通用 CPP 通过组合近端代理的类似 ADMM 更新、双代理的双上升更新以及原始代理的线性化 ADMM 更新，允许任意代理组合。我们证明了通用 CPP 的收敛结果，即温和假设下的次线性 O(1/k) 收敛速度，以及更强假设下的两步线性收敛。我们还讨论了基本方法的增强并提供了说明性的实证结果。

线性约束非光滑优化的原对偶块坐标方法的参数化和收敛

分类： 优化与控制, 分布式、并行和集群计算, 多代理系统, 49M29, 65Y20, 90C25, 49Q22

作者： Olivier Bilenne

发布时间： 2024-08-29

链接： http://arxiv.org/abs/2408.16424v1

摘要： 本说明涉及最小化受线性约束的可分离、凸、复合（光滑和非光滑）函数的问题。我们研究了基于不精确的近端梯度步骤的 Chambolle-Pock 原对偶算法的随机块坐标解释。所考虑算法的特殊性在于其鲁棒性，因为即使在没有强对偶性或线性程序不一致的情况下它也会收敛。使用矩阵预处理，我们在有或没有对偶假设以及凸和强凸设置的情况下得出严格的次线性收敛率。我们的开发是 Malitsky (2019) 以及 Luke 和 Malitsky (2018) 提出的原始算法的扩展和具体化。为服务定价的最优运输问题提供了数值实验。

用于病毒感染风险分析的 3D 拓扑建模和多智能体运动模拟

分类： 多代理系统, 计算工程、金融和科学, 软件工程

作者： Wassim Jabi, Yidan Xue, Thomas E. Woolley, Katerina Kaouri

发布时间： 2024-08-29

链接： http://arxiv.org/abs/2408.16417v1

摘要： 本文提出了一种通过集成计算机辅助建模、多智能体运动模拟和空气传播病毒传播建模来研究室内空间设计和人们在其中的运动如何影响疾病传播的方法。拓扑空间设计和分析软件用于对室内环境进行建模、连接空间并构建导航图。使用此图计算代理的路径，每个代理都具有独特的特征，例如步行速度、感染状态和活动。特工遵循具有特定地点和时间的活动时间表。该软件根据步行速度和事件开始时间计算“离开时间”，代理沿着导航图中的最短路径移动，准确地考虑障碍物、门口和墙壁。通过此设置可以实现代理之间的精确距离计算。然后使用反应扩散方程计算和可视化病毒气溶胶浓度，并通过 Wells-Riley ansatz 的扩展确定每种病原体的感染风险。这种时空和拓扑方法结合了现实的人类行为和空间动态，改进了感染风险模拟。由此产生的软件被设计为政策制定者、设施经理、利益相关者、建筑师和工程师的快速决策支持工具，以减轻现有建筑中的疾病传播并为新建筑的设计提供信息。通过对蜂窝式和开放式商业办公室规划布局的比较分析，证明了该软件的有效性。

使用认知模型改进个人参与推荐的预测

分类： 机器学习, 多代理系统

作者： Roderick Seow, Yunfan Zhao, Duncan Wood, Milind Tambe, Cleotilde Gonzalez

发布时间： 2024-08-28

链接： http://arxiv.org/abs/2408.16147v1

摘要： 对于资源有限的公共卫生项目来说，预测行为如何随时间变化以及对干预措施的反应的能力对于决定何时以及向谁分配干预措施至关重要。使用来自现实世界孕产妇健康项目的数据，我们演示了基于实例的学习（IBL）理论的认知模型如何增强现有的纯计算方法。我们的研究结果表明，与一般时间序列预测器（例如 LSTM）相比，反映人类决策过程的 IBL 模型可以更好地预测个体状态的动态。此外，IBL还提供了个体状态波动性及其对干预敏感性的估计，这可以提高其他时间序列模型的训练效率。

不同专家的不同方面：简化将定性见解整合到 ABM 开发中的框架

分类： 多代理系统, 计算机与社会

作者： Vivek Nallur, Pedram Aghaei, Graham Finlay

发布时间： 2024-08-28

链接： http://arxiv.org/abs/2408.15725v1

摘要： 基于代理的模拟的一个关键问题是整合多学科专家的定性见解极其困难。在大多数模拟中，代理能力和相应的行为需要被编程到代理中。我们报告了一种工具的架构，该工具将代理的编程功能与能力的获取和显示的行为分开。这使得多个不同领域的专家能够表达定性见解，而无需更改代码。随着获得更多见解，它还允许持续集成（甚至改变）定性行为过程。在模型中观察到的后续行为既更忠实于专家的见解，又能够与代表其他见解的其他模型进行对比。

TrafficGamer：利用博弈论预言机针对安全关键场景进行可靠且灵活的交通模拟

分类： 人工智能, 多代理系统

作者： Guanren Qiao, Guorui Quan, Jiawei Yu, Shujun Jia, Guiliang Liu

发布时间： 2024-08-28

链接： http://arxiv.org/abs/2408.15538v1

摘要： 虽然现代自动驾驶汽车 (AV) 系统可以在常规交通条件下制定可靠的驾驶策略，但它们经常会遇到安全关键的交通场景。这一困难主要源于驾驶数据集中此类场景的罕见性以及与多车辆之间的预测建模相关的复杂性。为了支持自动驾驶策略的测试和完善，模拟安全关键的交通事件是一个需要解决的重要挑战。在这项工作中，我们介绍了 TrafficGamer，它通过将常见的道路驾驶视为多智能体博弈来促进博弈论交通模拟。在评估各种现实世界数据集的经验性能时，TrafficGamer 确保模拟场景的保真度和可利用性，保证它们不仅与现实世界的流量分布静态一致，而且还能有效捕获代表涉及多个代理的安全关键场景的平衡。此外，结果表明 TrafficGamer 在各种环境下都表现出高度灵活的模拟。具体来说，我们证明了生成的场景可以通过在优化期间配置风险敏感约束来动态适应不同紧密度的平衡。据我们所知，TrafficGamer 是第一个能够生成涉及多个代理的不同流量场景的模拟器。我们在 https://qiaoguanren.github.io/trafficgamer-demo/ 上提供了该项目的演示网页。

多智能体系统中网络拓扑的图注意力推理

分类： 多代理系统, 机器学习

作者： Akshay Kolli, Reza Azadeh, Kshitj Jerath

发布时间： 2024-08-27

链接： http://arxiv.org/abs/2408.15449v1

摘要： 准确识别多智能体系统的底层图结构仍然是一个艰巨的挑战。我们的工作引入了一种新颖的基于机器学习的解决方案，该解决方案利用注意力机制通过学习节点表示来预测多智能体系统的未来状态。然后根据注意力值的强度推断出图结构。这种方法适用于线性一致性动力学和 Kuramoto 振荡器的非线性动力学，从而通过学习良好的代理表示来隐式学习图。我们的结果表明，所提出的数据驱动图注意力机器学习模型可以识别多智能体系统中的网络拓扑，即使底层动态模型未知，如链接预测中实现的 F1 分数所证明的那样。

通过目标和优先级交换进行去中心化无标记多智能体寻路（带补充）

分类： 多代理系统

作者： Stepan Dergachev, Konstantin Yakovlev

发布时间： 2024-08-27

链接： http://arxiv.org/abs/2408.14948v1

摘要： 在本文中，我们研究了多智能体寻路问题（MAPF）的一个具有挑战性的变体，即一组智能体必须到达一组目标位置，但哪个智能体到达特定目标并不重要 - 匿名 MAPF（AMAPF）。当前最优和次优 AMPF 求解器依赖于负责目标分配和寻路的集中控制器的存在。我们扩展了现有技术，并提出了第一个 AMPF 求解器，能够以完全去中心化的方式解决手头的问题，此时每个智能体单独做出决策，并且仅依赖于与其他智能体的本地通信。我们方法的核心是一个优先级和目标交换程序，旨在产生一致的目标分配（即确保没有两个代理朝着相同的目标前进）。结合已建立的基于规则的路径规划，我们最终得到了 TP-SWAP，这是一种解决去中心化 AMPF 问题的高效且灵活的方法。在理论方面，我们证明 TP-SWAP 是完整的（即 TP-SWAP 保证每个目标都会被某个代理达到）。根据经验，我们评估了各种设置中的 TP-SWAP，并将其与集中式和分散式基线进行比较。事实上，就流程时间（MAPF 中普遍存在的成本目标）而言，TP-SWAP 优于完全去中心化竞争对手，甚至可以胜过半去中心化竞争对手（即依赖于初始一致目标分配的竞争对手）。

具有真实机器人动力学和自动化仓库相互依赖任务的多代理路径查找

分类： 机器人技术, 人工智能, 多代理系统

作者： Vassilissa Lehoux-Lebacque, Tomi Silander, Christelle Loiodice, Seungjoon Lee, Albert Wang, Sofia Michel

发布时间： 2024-08-26

链接： http://arxiv.org/abs/2408.14527v1

摘要： 多智能体路径查找（MAPF）是自动化仓库和工厂中机器人部署的一个重要优化问题。尽管关于这个主题的工作量很大，但大多数方法都对环境和代理进行了大量简化，这使得生成的算法对于现实生活场景来说不切实际。在本文中，我们考虑了仓库中在线订单交付的现实问题，其中一组机器人将属于每个订单的产品从货架运送到工作站。这创建了一系列相互依赖的取货和送货任务，相关的 MAPF 问题包括计算完成这些任务的真实无碰撞机器人轨迹。为了解决这个 MAPF 问题，我们提出了标准优先规划算法的扩展来处理相互依赖的任务（交错优先规划）和新颖的 Via-Point Star (VP*) 算法来计算最佳的符合动力学的机器人轨迹访问一系列目标位置，同时避开移动障碍物。我们证明了我们方法的完整性，并在模拟和真实仓库中对其进行评估。

互联自动化车辆和机器人群小型测试台调查

分类： 机器人技术, 多代理系统

作者： Armin Mokhtarian, Jianye Xu, Patrick Scheffe, Maximilian Kloock, Simon Schäfer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, Johannes Betz, Sean Wilson, Spring Berman, Liam Paull, Amanda Prorok, Bassam Alrifaee

发布时间： 2024-08-26

链接： http://arxiv.org/abs/2408.14199v1

摘要： 互联自动化车辆和机器人群在提高运输和制造行业的安全性、效率和可持续性方面具有变革潜力。这些技术的广泛测试和验证对于它们在现实世界中的部署至关重要。虽然模拟对于初始测试至关重要，但它们在捕捉现实世界交互的复杂动态方面通常存在局限性。这一限制凸显了小规模测试平台的重要性。这些测试台为测试和验证算法提供了一个现实的、经济有效的、受控的环境，充当模拟和全面实验之间的重要中介。这项工作有助于研究人员努力确定适合他们实验的现有小型测试平台，并为那些想要构建自己的测试平台的人提供见解。此外，它还对这些测试平台的当前状况进行了全面调查。我们根据众所周知的感知-计划-行动范式得出了测试平台的 62 个特征，并提供了一个在线表格，比较了基于这些特征的 22 个小规模测试平台。该在线表格托管在我们指定的公共网页 www.cpm-remote.de/testbeds 上，我们邀请测试床创建者和开发人员为其做出贡献。我们在本文中仔细研究了九个测试台，展示了如何使用派生特征来呈现测试台。此外，我们讨论了我们确定的有关小规模测试平台的三个持续挑战，即小规模到全面的过渡、可持续性以及电力和资源管理。

标准 Borel 空间中的分散随机控制：集中式 MDP 约简、有限窗口局部信息的近最优性和 Q 学习

分类： 优化与控制, 多代理系统, 93E20, 91A99, 90B99

作者： Omar Mrani-Zentar, Serdar Yüksel

发布时间： 2024-08-25

链接： http://arxiv.org/abs/2408.13828v1

摘要： 分散随机控制问题本质上很难研究，因为集中控制的标准工具（例如动态规划）不适用以及由此产生的计算复杂性。在本文中，我们在统一主题下解决了三种不同但紧密相关的信息结构下 Borel 空间的分散随机控制的一些挑战：一步延迟信息共享模式、K 步周期性信息共享模式和完全分散的信息结构，不发生信息共享。我们将证明单步延迟和 K 步周期性问题可以简化为集中式 MDP，通过解决几个可测量性问题来概括考虑有限、线性或静态模型的先前结果。然后建立两种信息结构下策略的分离性质。然后，我们为两个集中约简的过渡核成为弱费勒提供了充分的条件，这有利于严格的近似和学习理论结果。然后我们将证明，对于完全分散的控制问题，有限内存局部策略在联合条件混合条件下接近最优。这是通过获取有限内存策略的界限来实现的，该界限随着内存大小的增加而变为零。我们还将提供 K 周期问题的性能界限，该问题是通过用有限的滑动信息窗口替换完整的公共信息而产生的。后者将取决于我们将建立的预期总变异中预测变量的稳定性条件。我们最终证明，在周期性信息共享模式下，量化 Q 学习算法渐近收敛于接近最优解。据我们所知，上述每一项都是对文献的新贡献。

智能仓库的多智能体目标分配和路径查找：协作多智能体深度强化学习视角

分类： 人工智能, 多代理系统

作者： Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li

发布时间： 2024-08-25

链接： http://arxiv.org/abs/2408.13750v1

摘要： 多智能体目标分配和路径规划（TAPF）是智能仓库中的两个关键问题。然而，大多数文献仅分别解决这两个问题之一。在本研究中，我们提出了一种从协作多智能体深度强化学习（RL）的角度同时解决目标分配和路径规划的方法。据我们所知，这是第一个将智能仓库的 TAPF 问题建模为协作多智能体深度强化学习的工作，也是第一个基于多智能体深度强化学习同时解决 TAPF 问题的工作。此外，以前的文献很少考虑主体的物理动力学。在这项研究中，考虑了代理的物理动力学。实验结果表明，我们的方法在各种任务设置中都表现良好，这意味着目标分配得到了相当好的解决，并且规划的路径几乎是最短的。此外，我们的方法比基线更省时。

DeepVoting：通过定制嵌入学习投票规则

分类： 多代理系统, 人工智能, 计算机科学与博弈论, 机器学习, 普通经济学, 经济学

作者： Leonardo Matone, Ben Abramowitz, Nicholas Mattei, Avinash Balakrishnan

发布时间： 2024-08-24

链接： http://arxiv.org/abs/2408.13630v1

摘要： 将多个代理的偏好聚合为集体决策是计算机科学领域许多重要问题的常见步骤，包括信息检索、强化学习和推荐系统。正如社会选择理论所表明的，为具有特定属性（公理）的聚合规则设计算法的问题可能很困难，或者在某些情况下证明是不可能的。人们可以从数据中学习聚合规则，特别是投票规则，而不是手动设计算法。然而，该领域的先前工作需要非常大的模型，或者受到偏好表示（即嵌入）的选择的限制。我们将设计良好投票规则的问题重新转化为学习投票规则的概率版本，输出一组候选人的分布。具体来说，我们使用神经网络从文献中学习概率社会选择函数。我们表明，如果嵌入是根据学习目标量身定制的，那么从社会选择文献中得出的偏好配置文件的嵌入使我们能够更有效地学习现有的投票规则，并且比其他工作更容易扩展到更多的选民群体。此外，我们表明可以调整使用嵌入学习的规则来创建具有改进的公理属性的新颖投票规则。也就是说，我们表明现有的投票规则只需要进行微小的修改即可对抗概率版本的“缺席悖论”。

多主体集体建设达到新高度

分类： 多代理系统

作者： Martin Rameš, Pavel Surynek

发布时间： 2024-08-24

链接： http://arxiv.org/abs/2408.13615v1

摘要： 我们提出了一种基于可逆斜坡思想的多智能体集体构建的新方法。我们的 ReRamp 算法利用可逆的侧坡道来生成比以前使用最先进的规划算法所能实现的更高和更大的坡道块结构的施工计划（在相同的建筑面积下）。我们在一组基准实例上将 ReRamp 算法与类似的最先进算法进行比较，展示了其卓越的计算速度。我们还在实验中证实，ReRamp 算法能够生成单层房屋的计划，这是通往现实世界多智能体建筑应用道路上的一个重要里程碑。

多智能体强化学习中增强多任务泛化的混合训练

分类： 机器学习, 多代理系统

作者： Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti

发布时间： 2024-08-24

链接： http://arxiv.org/abs/2408.13567v1

摘要： 在多智能体强化学习（MARL）中，实现对不同智能体和目标的多任务泛化提出了重大挑战。现有的在线 MARL 算法主要关注单任务性能，但缺乏多任务泛化能力通常会导致大量的计算浪费和有限的实际适用性。同时，现有的离线多任务 MARL 方法严重依赖于数据质量，通常会导致看不见的任务性能不佳。在本文中，我们介绍了 HyGen，一种新颖的混合 MARL 框架，即增强型多任务泛化的混合训练，它集成了在线和离线学习，以确保多任务泛化和训练效率。具体来说，我们的框架从离线多任务数据集中提取潜在的通用技能。然后，我们训练策略以在集中训练和分散执行范式（CTDE）下选择最佳技能。在此阶段，我们利用集成离线数据和在线交互的重放缓冲区。我们凭经验证明，我们的框架有效地提取和完善了通用技能，对未见过的任务产生了令人印象深刻的概括。对《星际争霸》多智能体挑战赛的比较分析表明，HyGen 的性能优于多种现有的纯在线和离线方法。

优化基于 LLM 的有限元分析代理的协作

分类： 人工智能, 计算工程、金融和科学, 多代理系统

作者： Chuan Tian, Yilei Zhang

发布时间： 2024-08-23

链接： http://arxiv.org/abs/2408.13406v1

摘要： 本文研究了编程和编码任务背景下大型语言模型 (LLM) 中多个代理之间的交互。我们利用 AutoGen 框架来促进代理之间的通信，根据每个设置 40 次随机运行的成功率来评估不同的配置。该研究的重点是开发一个灵活的自动化框架，用于应用有限元法（FEM）解决线弹性问题。我们的研究结果强调了优化代理角色并明确定义其职责的重要性，而不仅仅是增加代理数量。事实证明，代理之间的有效协作对于解决一般 FEM 挑战至关重要。这项研究展示了大语言模型多智能体系统在增强模拟方法中的计算自动化方面的潜力，为工程和人工智能的未来进步铺平了道路。

From Mobilisation to Radicalisation: Probing the Persistence and Radicalisation of Social Movements Using an Agent-Based Model

分类： 社交和信息网络, 多代理系统, 物理与社会

作者： Emma F. Thomas, Mengbin Ye, Simon D. Angus, Tony J. Mathew, Winnifred Louis, Liam Walsh, Silas Ellery, Morgana Lizzio-Wilson, Craig McGarty

发布时间： 2024-08-23

链接： http://arxiv.org/abs/2408.12795v1

摘要： We are living in an age of protest. Although we have an excellent understanding of the factors that predict participation in protest, we understand little about the conditions that foster a sustained (versus transient) movement. How do interactions between supporters and authorities combine to influence whether and how people engage (i.e., using conventional or radical tactics)? This paper introduces a novel, theoretically-founded and empirically-informed agent-based model (DIMESim) to address these questions. We model the complex interactions between the psychological attributes of the protester (agents), the authority to whom the protests are targeted, and the environment that allows protesters to coordinate with each other -- over time, and at a population scale. Where an authority is responsive and failure is contested, a modest sized conventional movement endured. Where authorities repeatedly and incontrovertibly fail the movement, the population disengaged from action but evidenced an ongoing commitment to radicalism (latent radicalism).

MEDCO: Medical Education Copilots Based on A Multi-Agent Framework

分类： 人工智能, 多代理系统

作者： Hao Wei, Jianing Qiu, Haibao Yu, Wu Yuan

发布时间： 2024-08-22

链接： http://arxiv.org/abs/2408.12496v1

摘要： Large language models (LLMs) have had a significant impact on diverse research domains, including medicine and healthcare. However, the potential of LLMs as copilots in medical education remains underexplored. Current AI-assisted educational tools are limited by their solitary learning approach and inability to simulate the multi-disciplinary and interactive nature of actual medical training. To address these limitations, we propose MEDCO (Medical EDucation COpilots), a novel multi-agent-based copilot system specially developed to emulate real-world medical training environments. MEDCO incorporates three primary agents: an agentic patient, an expert doctor, and a radiologist, facilitating a multi-modal and interactive learning environment. Our framework emphasizes the learning of proficient question-asking skills, multi-disciplinary collaboration, and peer discussions between students. Our experiments show that simulated virtual students who underwent training with MEDCO not only achieved substantial performance enhancements comparable to those of advanced models, but also demonstrated human-like learning behaviors and improvements, coupled with an increase in the number of learning samples. This work contributes to medical education by introducing a copilot that implements an interactive and collaborative learning approach. It also provides valuable insights into the effectiveness of AI-integrated training paradigms.

Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

分类： 机器学习, 人工智能, 多代理系统

作者： Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe

发布时间： 2024-08-22

链接： http://arxiv.org/abs/2408.12112v1

摘要： LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.

Empirical Equilibria in Agent-based Economic systems with Learning agents

分类： 多代理系统, 计算机科学与博弈论, 普通经济学, 经济学

作者： Kshama Dwarakanath, Svitlana Vyetrenko, Tucker Balch

发布时间： 2024-08-21

链接： http://arxiv.org/abs/2408.12038v1

摘要： We present an agent-based simulator for economic systems with heterogeneous households, firms, central bank, and government agents. These agents interact to define production, consumption, and monetary flow. Each agent type has distinct objectives, such as households seeking utility from consumption and the central bank targeting inflation and production. We define this multi-agent economic system using an OpenAI Gym-style environment, enabling agents to optimize their objectives through reinforcement learning. Standard multi-agent reinforcement learning (MARL) schemes, like independent learning, enable agents to learn concurrently but do not address whether the resulting strategies are at equilibrium. This study integrates the Policy Space Response Oracle (PSRO) algorithm, which has shown superior performance over independent MARL in games with homogeneous agents, with economic agent-based modeling. We use PSRO to develop agent policies approximating Nash equilibria of the empirical economic game, thereby linking to economic equilibria. Our results demonstrate that PSRO strategies achieve lower regret values than independent MARL strategies in our economic system with four agent types. This work aims to bridge artificial intelligence, economics, and empirical game theory towards future research.

VIRIS: Simulating indoor airborne transmission combining architectural design and people movement

分类： 计算机与社会, 多代理系统, 物理与社会

作者： Yidan Xue, Wassim Jabi, Thomas E. Woolley, Katerina Kaouri

发布时间： 2024-08-21

链接： http://arxiv.org/abs/2408.11772v1

摘要： A Viral Infection Risk Indoor Simulator (VIRIS) has been developed to quickly assess and compare mitigations for airborne disease spread. This agent-based simulator combines people movement in an indoor space, viral transmission modelling and detailed architectural design, and it is powered by topologicpy, an open-source Python library. VIRIS generates very fast predictions of the viral concentration and the spatiotemporal infection risk for individuals as they move through a given space. The simulator is validated with data from a courtroom superspreader event. A sensitivity study for unknown parameter values is also performed. We compare several non-pharmaceutical interventions (NPIs) issued in UK government guidance, for two indoor settings: a care home and a supermarket. Additionally, we have developed the user-friendly VIRIS web app that allows quick exploration of diverse scenarios of interest and visualisation, allowing policymakers, architects and space managers to easily design or assess infection risk in an indoor space.

Bayesian Optimization Framework for Efficient Fleet Design in Autonomous Multi-Robot Exploration

分类： 机器人技术, 多代理系统

作者： David Molina Concha, Jiping Li, Haoran Yin, Kyeonghyeon Park, Hyun-Rok Lee, Taesik Lee, Dhruv Sirohi, Chi-Guhn Lee

发布时间： 2024-08-21

链接： http://arxiv.org/abs/2408.11751v1

摘要： This study addresses the challenge of fleet design optimization in the context of heterogeneous multi-robot fleets, aiming to obtain feasible designs that balance performance and costs. In the domain of autonomous multi-robot exploration, reinforcement learning agents play a central role, offering adaptability to complex terrains and facilitating collaboration among robots. However, modifying the fleet composition results in changes in the learned behavior, and training multi-robot systems using multi-agent reinforcement learning is expensive. Therefore, an exhaustive evaluation of each potential fleet design is infeasible. To tackle these hurdles, we introduce Bayesian Optimization for Fleet Design (BOFD), a framework leveraging multi-objective Bayesian Optimization to explore fleets on the Pareto front of performance and cost while accounting for uncertainty in the design space. Moreover, we establish a sub-linear bound for cumulative regret, supporting BOFD's robustness and efficacy. Extensive benchmark experiments in synthetic and simulated environments demonstrate the superiority of our framework over state-of-the-art methods, achieving efficient fleet designs with minimal fleet evaluations.

Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation

分类： 多代理系统, 人工智能, 计算机科学与博弈论, 机器学习, 系统与控制, 系统与控制

作者： Patrick Benjamin, Alessandro Abate

发布时间： 2024-08-21

链接： http://arxiv.org/abs/2408.11607v1

摘要： Recent works have provided algorithms by which decentralised agents, which may be connected via a communication network, can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system. However, these algorithms are given for tabular settings: this computationally limits the size of players' observation space, meaning that the algorithms are not able to handle anything but small state spaces, nor to generalise beyond policies depending on the ego player's state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the population's mean-field distribution in the observation for each player's policy, it is arguably unrealistic to assume that decentralised agents would have access to this global information: we therefore additionally provide new algorithms that allow agents to estimate the global empirical distribution based on a local neighbourhood, and to improve this estimate via communication over a given network. Our experiments showcase how the communication network allows decentralised agents to estimate the mean-field distribution for population-dependent policies, and that exchanging policy information helps networked agents to outperform both independent and even centralised agents in function-approximation settings, by an even greater margin than in tabular settings.

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

分类： 多代理系统, 机器人技术

作者： Cheng Xu, Changtian Zhang, Yuchen Shi, Ran Wang, Shihong Duan, Yadong Wan, Xiaotong Zhang

发布时间： 2024-08-21

链接： http://arxiv.org/abs/2408.11416v1

摘要： Recent advancements in reinforcement learning have made significant impacts across various domains, yet they often struggle in complex multi-agent environments due to issues like algorithm instability, low sampling efficiency, and the challenges of exploration and dimensionality explosion. Hierarchical reinforcement learning (HRL) offers a structured approach to decompose complex tasks into simpler sub-tasks, which is promising for multi-agent settings. This paper advances the field by introducing a hierarchical architecture that autonomously generates effective subgoals without explicit constraints, enhancing both flexibility and stability in training. We propose a dynamic goal generation strategy that adapts based on environmental changes. This method significantly improves the adaptability and sample efficiency of the learning process. Furthermore, we address the critical issue of credit assignment in multi-agent systems by synergizing our hierarchical architecture with a modified QMIX network, thus improving overall strategy coordination and efficiency. Comparative experiments with mainstream reinforcement learning algorithms demonstrate the superior convergence speed and performance of our approach in both single-agent and multi-agent environments, confirming its effectiveness and flexibility in complex scenarios. Our code is open-sourced at: \url{https://github.com/SICC-Group/GMAH}.

Deep Reinforcement Learning for Decentralized Multi-Robot Control: A DQN Approach to Robustness and Information Integration

分类： 机器人技术, 多代理系统

作者： Bin Wu, C Steve Suh

发布时间： 2024-08-21

链接： http://arxiv.org/abs/2408.11339v1

摘要： The superiority of Multi-Robot Systems (MRS) in various complex environments is unquestionable. However, in complex situations such as search and rescue, environmental monitoring, and automated production, robots are often required to work collaboratively without a central control unit. This necessitates an efficient and robust decentralized control mechanism to process local information and guide the robots' behavior. In this work, we propose a new decentralized controller design method that utilizes the Deep Q-Network (DQN) algorithm from deep reinforcement learning, aimed at improving the integration of local information and robustness of multi-robot systems. The designed controller allows each robot to make decisions independently based on its local observations while enhancing the overall system's collaborative efficiency and adaptability to dynamic environments through a shared learning mechanism. Through testing in simulated environments, we have demonstrated the effectiveness of this controller in improving task execution efficiency, strengthening system fault tolerance, and enhancing adaptability to the environment. Furthermore, we explored the impact of DQN parameter tuning on system performance, providing insights for further optimization of the controller design. Our research not only showcases the potential application of the DQN algorithm in the decentralized control of multi-robot systems but also offers a new perspective on how to enhance the overall performance and robustness of the system through the integration of local information.

Optimization of Multi-Agent Flying Sidekick Traveling Salesman Problem over Road Networks

分类： 机器人技术, 人工智能, 多代理系统

作者： Ruixiao Yang, Chuchu Fan

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.11187v1

摘要： The mixed truck-drone delivery systems have attracted increasing attention for last-mile logistics, but real-world complexities demand a shift from single-agent, fully connected graph models to multi-agent systems operating on actual road networks. We introduce the multi-agent flying sidekick traveling salesman problem (MA-FSTSP) on road networks, extending the single truck-drone model to multiple trucks, each carrying multiple drones while considering full road networks for truck restrictions and flexible drone routes. We propose a mixed-integer linear programming model and an efficient three-phase heuristic algorithm for this NP-hard problem. Our approach decomposes MA-FSTSP into manageable subproblems of one truck with multiple drones. Then, it computes the routes for trucks without drones in subproblems, which are used in the final phase as heuristics to help optimize drone and truck routes simultaneously. Extensive numerical experiments on Manhattan and Boston road networks demonstrate our algorithm's superior effectiveness and efficiency, significantly outperforming both column generation and variable neighborhood search baselines in solution quality and computation time. Notably, our approach scales to more than 300 customers within a 5-minute time limit, showcasing its potential for large-scale, real-world logistics applications.

Autonomous Negotiation Using Comparison-Based Gradient Estimation

分类： 多代理系统, 人工智能, 优化与控制

作者： Surya Murthy, Mustafa O. Karabag, Ufuk Topcu

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.11186v1

摘要： Negotiation is useful for resolving conflicts in multi-agent systems. We explore autonomous negotiation in a setting where two self-interested rational agents sequentially trade items from a finite set of categories. Each agent has a utility function that depends on the amount of items it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent accepts offers that improve its utility. We present a comparison-based algorithm for the offering agent that generates offers through previous acceptance or rejection responses without extensive information sharing. The algorithm estimates the responding agent's gradient by leveraging the rationality assumption and rejected offers to prune the space of potential gradients. After the algorithm makes a finite number of consecutively rejected offers, the responding agent is at a near-optimal state, or the agents' preferences are closely aligned. Additionally, we facilitate negotiations with humans by representing natural language feedback as comparisons that can be integrated into the proposed algorithm. We compare the proposed algorithm against random search baselines in integer and fractional trading scenarios and show that it improves the societal benefit with fewer offers.

Athena: Safe Autonomous Agents with Verbal Contrastive Learning

分类： 计算和语言, 人工智能, 多代理系统

作者： Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.11021v1

摘要： Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.

DBHP: Trajectory Imputation in Multi-Agent Sports Using Derivative-Based Hybrid Prediction

分类： 人工智能, 机器学习, 多代理系统

作者： Hanjun Choi, Hyunsung Kim, Minho Lee, Chang-Jo Kim, Jinsung Yoon, Sang-Ki Ko

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.10878v2

摘要： Many spatiotemporal domains handle multi-agent trajectory data, but in real-world scenarios, collected trajectory data are often partially missing due to various reasons. While existing approaches demonstrate good performance in trajectory imputation, they face challenges in capturing the complex dynamics and interactions between agents due to a lack of physical constraints that govern realistic trajectories, leading to suboptimal results. To address this issue, the paper proposes a Derivative-Based Hybrid Prediction (DBHP) framework that can effectively impute multiple agents' missing trajectories. First, a neural network equipped with Set Transformers produces a naive prediction of missing trajectories while satisfying the permutation-equivariance in terms of the order of input agents. Then, the framework makes alternative predictions leveraging velocity and acceleration information and combines all the predictions with properly determined weights to provide final imputed trajectories. In this way, our proposed framework not only accurately predicts position, velocity, and acceleration values but also enforces the physical relationship between them, eventually improving both the accuracy and naturalness of the predicted trajectories. Accordingly, the experiment results about imputing player trajectories in team sports show that our framework significantly outperforms existing imputation baselines.

Multi-Agent Based Simulation for Decentralized Electric Vehicle Charging Strategies and their Impacts

分类： 多代理系统

作者： Kristoffer Christensen, Bo Nørregaard Jørgensen, Zheng Grace Ma

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.10790v1

摘要： The growing shift towards a Smart Grid involves integrating numerous new digital energy solutions into the energy ecosystems to address problems arising from the transition to carbon neutrality, particularly in linking the electricity and transportation sectors. Yet, this shift brings challenges due to mass electric vehicle adoption and the lack of methods to adequately assess various EV charging algorithms and their ecosystem impacts. This paper introduces a multi-agent based simulation model, validated through a case study of a Danish radial distribution network serving 126 households. The study reveals that traditional charging leads to grid overload by 2031 at 67% EV penetration, while decentralized strategies like Real-Time Pricing could cause overloads as early as 2028. The developed multi-agent based simulation demonstrates its ability to offer detailed, hourly analysis of future load profiles in distribution grids, and therefore, can be applied to other prospective scenarios in similar energy systems.

Multi-agent based modeling for investigating excess heat utilization from electrolyzer production to district heating network

分类： 多代理系统

作者： Kristoffer Christensen, Bo Nørregaard Jørgensen, Zheng Grace Ma

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.10783v1

摘要： Power-to-Hydrogen is crucial for the renewable energy transition, yet existing literature lacks business models for the significant excess heat it generates. This study addresses this by evaluating three models for selling electrolyzer-generated heat to district heating grids: constant, flexible, and renewable-source hydrogen production, with and without heat sales. Using agent-based modeling and multi-criteria decision-making methods (VIKOR, TOPSIS, PROMETHEE), it finds that selling excess heat can cut hydrogen production costs by 5.6%. The optimal model operates flexibly with electricity spot prices, includes heat sales, and maintains a hydrogen price of 3.3 EUR/kg. Environmentally, hydrogen production from grid electricity could emit up to 13,783.8 tons of CO2 over four years from 2023. The best economic and environmental model uses renewable sources and sells heat at 3.5 EUR/kg

Multi-Agent Based Simulation for Investigating Centralized Charging Strategies and their Impact on Electric Vehicle Home Charging Ecosystem

分类： 多代理系统

作者： Kristoffer Christensen, Bo Nørregaard Jørgensen, Zheng Grace Ma

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.10773v1

摘要： This paper addresses the critical integration of electric vehicles (EVs) into the electricity grid, which is essential for achieving carbon neutrality by 2050. The rapid increase in EV adoption poses significant challenges to the existing grid infrastructure, particularly in managing the increasing electricity demand and mitigating the risk of grid overloads. Centralized EV charging strategies are investigated due to their potential to optimize grid stability and efficiency, compared to decentralized approaches that may exacerbate grid stress. Utilizing a multi-agent based simulation model, the study provides a realistic representation of the electric vehicle home charging ecosystem in a case study of Strib, Denmark. The findings show that the Earliest-deadline-first and Round Robin perform best with 100% EV adoption in terms of EV user satisfaction. The simulation considers a realistic adoption curve, EV charging strategies, EV models, and driving patterns to capture the full ecosystem dynamics over a long-term period with high resolution (hourly). Additionally, the study offers detailed load profiles for future distribution grids, demonstrating how centralized charging strategies can efficiently manage grid loads and prevent overloads.

Analyzing the Impact of Electric Vehicles on Local Energy Systems using Digital Twins

分类： 多代理系统, I.6.5

作者： Daniel René Bayer, Marco Pruckner

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.10763v1

摘要： The electrification of the transportation and heating sector, the so-called sector coupling, is one of the core elements to achieve independence from fossil fuels. As it highly affects the electricity demand, especially on the local level, the integrated modeling and simulation of all sectors is a promising approach for analyzing design decisions or complex control strategies. This paper analyzes the increase in electricity demand resulting from sector coupling, mainly due to integrating electric vehicles into urban energy systems. Therefore, we utilize a digital twin of an existing local energy system and extend it with a mobility simulation model to evaluate the impact of electric vehicles on the distribution grid level. Our findings indicate a significant rise in annual electricity consumption attributed to electric vehicles, with home charging alone resulting in a 78% increase. However, we demonstrate that integrating photovoltaic and battery energy storage systems can effectively mitigate this rise.

Synchronization behind Learning in Periodic Zero-Sum Games Triggers Divergence from Nash equilibrium

分类： 计算机科学与博弈论, 多代理系统, 优化与控制, 混沌动力学

作者： Yuma Fujimoto, Kaito Ariu, Kenshi Abe

发布时间： 2024-08-20

链接： http://arxiv.org/abs/2408.10595v1

摘要： Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varies (called a ``periodic'' game), however, the Nash equilibrium moves generically. How learning dynamics behave in such periodic games is of interest but still unclear. Interestingly, we discover that the behavior is highly dependent on the relationship between the two speeds at which the game changes and at which players learn. We observe that when these two speeds synchronize, the learning dynamics diverge, and their time-average does not converge. Otherwise, the learning dynamics draw complicated cycles, but their time-average converges. Under some assumptions introduced for the dynamical systems analysis, we prove that this behavior occurs. Furthermore, our experiments observe this behavior even if removing these assumptions. This study discovers a novel phenomenon, i.e., synchronization, and gains insight widely applicable to learning in periodic games.

Tax Credits and Household Behavior: The Roles of Myopic Decision-Making and Liquidity in a Simulated Economy

分类： 多代理系统, 普通经济学, 经济学

作者： Jialin Dong, Kshama Dwarakanath, Svitlana Vyetrenko

发布时间： 2024-08-19

链接： http://arxiv.org/abs/2408.10391v1

摘要： There has been a growing interest in multi-agent simulators in the domain of economic modeling. However, contemporary research often involves developing reinforcement learning (RL) based models that focus solely on a single type of agents, such as households, firms, or the government. Such an approach overlooks the adaptation of interacting agents thereby failing to capture the complexity of real-world economic systems. In this work, we consider a multi-agent simulator comprised of RL agents of numerous types, including heterogeneous households, firm, central bank and government. In particular, we focus on the crucial role of the government in distributing tax credits to households. We conduct two broad categories of comprehensive experiments dealing with the impact of tax credits on 1) households with varied degrees of myopia (short-sightedness in spending and saving decisions), and 2) households with diverse liquidity profiles. The first category of experiments examines the impact of the frequency of tax credits (e.g. annual vs quarterly) on consumption patterns of myopic households. The second category of experiments focuses on the impact of varying tax credit distribution strategies on households with differing liquidities. We validate our simulation model by reproducing trends observed in real households upon receipt of unforeseen, uniform tax credits, as documented in a JPMorgan Chase report. Based on the results of the latter, we propose an innovative tax credit distribution strategy for the government to reduce inequality among households. We demonstrate the efficacy of this strategy in improving social welfare in our simulation results.

Auctioning Escape Permits for Multiple Correlated Pollutants Using CMRA

分类： 计算机科学与博弈论, 多代理系统

作者： Keshav Goyal, Sooraj Sathish, Shrisha Rao

发布时间： 2024-08-19

链接： http://arxiv.org/abs/2408.10148v1

摘要： In the context of increasingly complex environmental challenges, effective pollution control mechanisms are crucial. By extending the state of the art auction mechanisms, we aim to develop an efficient approach for allocating pollution abatement resources in a multi-pollutant setting with pollutants affecting each other's reduction costs. We modify the Combinatorial Multi-Round Ascending Auction for the auction of escape permits of pollutants with co-dependent reduction processes, specifically, greenhouse gas emissions and nutrient runoff in Finnish agriculture. We show the significant advantages of this mechanism in pollution control through experiments on the bid prices and amount of escape permits sold in multiple auction simulations.

Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)

分类： 计算机科学与博弈论, 人工智能, 多代理系统

作者： Muhammad Najib, Giuseppe Perelli

发布时间： 2024-08-19

链接： http://arxiv.org/abs/2408.10074v1

摘要： Mechanism design is a well-established game-theoretic paradigm for designing games to achieve desired outcomes. This paper addresses a closely related but distinct concept, equilibrium design. Unlike mechanism design, the designer's authority in equilibrium design is more constrained; she can only modify the incentive structures in a given game to achieve certain outcomes without the ability to create the game from scratch. We study the problem of equilibrium design using dynamic incentive structures, known as reward machines. We use weighted concurrent game structures for the game model, with goals (for the players and the designer) defined as mean-payoff objectives. We show how reward machines can be used to represent dynamic incentives that allocate rewards in a manner that optimises the designer's goal. We also introduce the main decision problem within our framework, the payoff improvement problem. This problem essentially asks whether there exists a dynamic incentive (represented by some reward machine) that can improve the designer's payoff by more than a given threshold value. We present two variants of the problem: strong and weak. We demonstrate that both can be solved in polynomial time using a Turing machine equipped with an NP oracle. Furthermore, we also establish that these variants are either NP-hard or coNP-hard. Finally, we show how to synthesise the corresponding reward machine if it exists.

MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems

分类： 多代理系统

作者： Qian Wang, Tianyu Wang, Qinbin Li, Jingsheng Liang, Bingsheng He

发布时间： 2024-08-19

链接： http://arxiv.org/abs/2408.09955v2

摘要： With the emergence of large language models (LLMs), LLM-powered multi-agent systems (LLM-MA systems) have been proposed to tackle real-world tasks. However, their agents mostly follow predefined Standard Operating Procedures (SOPs) that remain unchanged across the whole interaction, lacking autonomy and scalability. Additionally, current solutions often overlook the necessity for effective agent cooperation. To address the above limitations, we propose MegaAgent, a practical framework designed for autonomous cooperation in large-scale LLM Agent systems. MegaAgent leverages the autonomy of agents to dynamically generate agents based on task requirements, incorporating features such as automatically dividing tasks, systematic planning and monitoring of agent activities, and managing concurrent operations. In addition, MegaAgent is designed with a hierarchical structure and employs system-level parallelism to enhance performance and boost communication. We demonstrate the effectiveness of MegaAgent through Gobang game development, showing that it outperforms popular LLM-MA systems; and national policy simulation, demonstrating its high autonomy and potential to rapidly scale up to 590 agents while ensuring effective cooperation among them. Our results indicate that MegaAgent is the first autonomous large-scale LLM-MA system with no pre-defined SOPs, high effectiveness and scalability, paving the way for further research in this field. Our code is at https://anonymous.4open.science/r/MegaAgent-81F3.

Algorithmic Contract Design with Reinforcement Learning Agents

分类： 多代理系统

作者： David Molina Concha, Kyeonghyeon Park, Hyun-Rok Lee, Taesik Lee, Chi-Guhn Lee

发布时间： 2024-08-19

链接： http://arxiv.org/abs/2408.09686v1

摘要： We introduce a novel problem setting for algorithmic contract design, named the principal-MARL contract design problem. This setting extends traditional contract design to account for dynamic and stochastic environments using Markov Games and Multi-Agent Reinforcement Learning. To tackle this problem, we propose a Multi-Objective Bayesian Optimization (MOBO) framework named Constrained Pareto Maximum Entropy Search (cPMES). Our approach integrates MOBO and MARL to explore the highly constrained contract design space, identifying promising incentive and recruitment decisions. cPMES transforms the principal-MARL contract design problem into an unconstrained multi-objective problem, leveraging the probability of feasibility as part of the objectives and ensuring promising designs predicted on the feasibility border are included in the Pareto front. By focusing the entropy prediction on designs within the Pareto set, cPMES mitigates the risk of the search strategy being overwhelmed by entropy from constraints. We demonstrate the effectiveness of cPMES through extensive benchmark studies in synthetic and simulated environments, showing its ability to find feasible contract designs that maximize the principal's objectives. Additionally, we provide theoretical support with a sub-linear regret bound concerning the number of iterations.

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

分类： 人工智能, 多代理系统, 机器人技术

作者： Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

发布时间： 2024-08-19

链接： http://arxiv.org/abs/2408.09675v1

摘要： Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutual influences among different system components, and the distribution of computational resources. This augments the complexity of algorithmic design and poses higher requirements on computational resources. Simultaneously, simulators are crucial to obtain realistic data, which is the fundamentals of RL. In this paper, we first propose a series of metrics of simulators and summarize the features of existing benchmarks. Second, to ease comprehension, we recall the foundational knowledge and then synthesize the recently advanced studies of MARL-related autonomous driving and intelligent transportation systems. Specifically, we examine their environmental modeling, state representation, perception units, and algorithm design. Conclusively, we discuss open challenges as well as prospects and opportunities. We hope this paper can help the researchers integrate MARL technologies and trigger more insightful ideas toward the intelligent and autonomous driving.

GNN 授权的多无人机网络 AoI 管理的有效部分观测 MARL 方法

分类： 信息论, 机器学习, 多代理系统, 系统与控制, 系统与控制, 信息论

作者： Yuhao Pan, Xiucheng Wang, Zhiyao Xu, Nan Cheng, Wenchao Xu, Jun-jie Zhang

发布时间： 2024-08-18

链接： http://arxiv.org/abs/2409.00036v1

摘要： 无人机（UAV）由于其低成本和高灵活性，已广泛应用于各种场景以增强网络性能。然而，未知区域或没有足够先验信息的区域的无人机轨迹优化仍然面临规划性能差和分布式执行低的挑战。当无人机仅依赖自身的观测信息以及其通信范围内其他无人机的信息而无法获取全球信息时，就会出现这些挑战。为了应对这些挑战，本文提出了 Qedgix 框架，该框架结合图神经网络（GNN）和 QMIX 算法，为未知场景下的用户实现信息时代（AoI）的分布式优化。该框架利用 GNN 从无人机、可观测范围内的用户以及可通信范围内的其他无人机中提取信息，从而实现有效的无人机轨迹规划。由于 AoI 指标的离散化和时间特征，Qedgix 框架采用 QMIX 基于集中式训练和分布式执行（CTDE）针对用户的平均 AoI 值优化分布式部分可观察马尔可夫决策过程（Dec-POMDP）。通过用 AoI 对无人机网络优化问题进行建模并应用 Kolmogorov-Arnold 表示定理，Qedgix 框架通过基于排列不变性的参数共享实现高效的神经网络训练。仿真结果表明，该算法显着提高了收敛速度，同时降低了用户的平均 AoI 值。代码可在 https://github.com/UNIC-Lab/Qedgix 获取。

Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

分类： 多代理系统, 人工智能

作者： Zhiwei Xu, Hangyu Mao, Nianmin Zhang, Xin Xin, Pengjie Ren, Dapeng Li, Bin Zhang, Guoliang Fan, Zhumin Chen, Changwei Wang, Jiangjin Yin

发布时间： 2024-08-18

链接： http://arxiv.org/abs/2408.09501v1

摘要： In partially observable multi-agent systems, agents typically only have access to local observations. This severely hinders their ability to make precise decisions, particularly during decentralized execution. To alleviate this problem and inspired by image outpainting, we propose State Inference with Diffusion Models (SIDIFF), which uses diffusion models to reconstruct the original global state based solely on local observations. SIDIFF consists of a state generator and a state extractor, which allow agents to choose suitable actions by considering both the reconstructed global state and local observations. In addition, SIDIFF can be effortlessly incorporated into current multi-agent reinforcement learning algorithms to improve their performance. Finally, we evaluated SIDIFF on different experimental platforms, including Multi-Agent Battle City (MABC), a novel and flexible multi-agent reinforcement learning environment we developed. SIDIFF achieved desirable results and outperformed other popular algorithms.

Value-Enriched Population Synthesis: Integrating a Motivational Layer

分类： 多代理系统

作者： Alba Aguilera, Miquel Albertí, Nardine Osman, Georgina Curto

发布时间： 2024-08-18

链接： http://arxiv.org/abs/2408.09407v1

摘要： In recent years, computational improvements have allowed for more nuanced, data-driven and geographically explicit agent-based simulations. So far, simulations have struggled to adequately represent the attributes that motivate the actions of the agents. In fact, existing population synthesis frameworks generate agent profiles limited to socio-demographic attributes. In this paper, we introduce a novel value-enriched population synthesis framework that integrates a motivational layer with the traditional individual and household socio-demographic layers. Our research highlights the significance of extending the profile of agents in synthetic populations by incorporating data on values, ideologies, opinions and vital priorities, which motivate the agents' behaviour. This motivational layer can help us develop a more nuanced decision-making mechanism for the agents in social simulation settings. Our methodology integrates microdata and macrodata within different Bayesian network structures. This contribution allows to generate synthetic populations with integrated value systems that preserve the inherent socio-demographic distributions of the real population in any specific region.

Joint-perturbation simultaneous pseudo-gradient

分类： 计算机科学与博弈论, 多代理系统

作者： Carlos Martin, Tuomas Sandholm

发布时间： 2024-08-17

链接： http://arxiv.org/abs/2408.09306v1

摘要： We study the problem of computing an approximate Nash equilibrium of a game whose strategy space is continuous without access to gradients of the utility function. Such games arise, for example, when players' strategies are represented by the parameters of a neural network. Lack of access to gradients is common in reinforcement learning settings, where the environment is treated as a black box, as well as equilibrium finding in mechanisms such as auctions, where the mechanism's payoffs are discontinuous in the players' actions. To tackle this problem, we turn to zeroth-order optimization techniques that combine pseudo-gradients with equilibrium-finding dynamics. Specifically, we introduce a new technique that requires a number of utility function evaluations per iteration that is constant rather than linear in the number of players. It achieves this by performing a single joint perturbation on all players' strategies, rather than perturbing each one individually. This yields a dramatic improvement for many-player games, especially when the utility function is expensive to compute in terms of wall time, memory, money, or other resources. We evaluate our approach on various games, including auctions, which have important real-world applications. Our approach yields a significant reduction in the run time required to reach an approximate Nash equilibrium.

ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs

分类： 人工智能, 信息检索, 机器学习, 多代理系统

作者： Debashis Gupta, Aditi Golder, Luis Fernendez, Miles Silman, Greg Lersen, Fan Yang, Bob Plemmons, Sarra Alqahtani, Paul Victor Pauca

发布时间： 2024-08-16

链接： http://arxiv.org/abs/2408.08972v1

摘要： Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial information about ASGM practices and their environmental effects. The current version of ASGM-KG consists of 1,899 triples extracted using a large language model (LLM) from documents and reports published by both non-governmental and governmental organizations. These documents were carefully selected by a group of tropical ecologists with expertise in ASGM. This knowledge graph was validated using two methods. First, a small team of ASGM experts reviewed and labeled triples as factual or non-factual. Second, we devised and applied an automated factual reduction framework that relies on a search engine and an LLM for labeling triples. Our framework performs as well as five baselines on a publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG validated by domain experts. ASGM-KG demonstrates an advancement in knowledge aggregation and representation for complex, interdisciplinary environmental crises such as ASGM.

The computational power of a human society: a new model of social evolution

分类： 多代理系统, 普通经济学, 物理与社会, 经济学

作者： David H. Wolpert, Kyle Harper

发布时间： 2024-08-16

链接： http://arxiv.org/abs/2408.08861v1

摘要： Social evolutionary theory seeks to explain increases in the scale and complexity of human societies, from origins to present. Over the course of the twentieth century, social evolutionary theory largely fell out of favor as a way of investigating human history, just as advances in complex systems science and computer science saw the emergence of powerful new conceptions of complex systems, and in particular new methods of measuring complexity. We propose that these advances in our understanding of complex systems and computer science should be brought to bear on our investigations into human history. To that end, we present a new framework for modeling how human societies co-evolve with their biotic environments, recognizing that both a society and its environment are computers. This leads us to model the dynamics of each of those two systems using the same, new kind of computational machine, which we define here. For simplicity, we construe a society as a set of interacting occupations and technologies. Similarly, under such a model, a biotic environment is a set of interacting distinct ecological and climatic processes. This provides novel ways to characterize social complexity, which we hope will cast new light on the archaeological and historical records. Our framework also provides a natural way to formalize both the energetic (thermodynamic) costs required by a society as it runs, and the ways it can extract thermodynamic resources from the environment in order to pay for those costs -- and perhaps to grow with any left-over resources.

AgentSimulator: An Agent-based Approach for Data-driven Business Process Simulation

分类： 多代理系统, 人工智能

作者： Lukas Kirchdorfer, Robert Blümel, Timotheus Kampik, Han van der Aa, Heiner Stuckenschmidt

发布时间： 2024-08-16

链接： http://arxiv.org/abs/2408.08571v1

摘要： Business process simulation (BPS) is a versatile technique for estimating process performance across various scenarios. Traditionally, BPS approaches employ a control-flow-first perspective by enriching a process model with simulation parameters. Although such approaches can mimic the behavior of centrally orchestrated processes, such as those supported by workflow systems, current control-flow-first approaches cannot faithfully capture the dynamics of real-world processes that involve distinct resource behavior and decentralized decision-making. Recognizing this issue, this paper introduces AgentSimulator, a resource-first BPS approach that discovers a multi-agent system from an event log, modeling distinct resource behaviors and interaction patterns to simulate the underlying process. Our experiments show that AgentSimulator achieves state-of-the-art simulation accuracy with significantly lower computation times than existing approaches while providing high interpretability and adaptability to different types of process-execution scenarios.

Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy

分类： 多代理系统

作者： Xin Gao, Zhaoyang Ma, Xueyuan Li, Xiaoqiang Meng, Zirui Li

发布时间： 2024-08-16

链接： http://arxiv.org/abs/2408.08516v1

摘要： In the realm of heterogeneous mixed autonomy, vehicles experience dynamic spatial correlations and nonlinear temporal interactions in a complex, non-Euclidean space. These complexities pose significant challenges to traditional decision-making frameworks. Addressing this, we propose a hierarchical reinforcement learning framework integrated with multilevel graph representations, which effectively comprehends and models the spatiotemporal interactions among vehicles navigating through uncertain traffic conditions with varying decision-making systems. Rooted in multilevel graph representation theory, our approach encapsulates spatiotemporal relationships inherent in non-Euclidean spaces. A weighted graph represents spatiotemporal features between nodes, addressing the degree imbalance inherent in dynamic graphs. We integrate asynchronous parallel hierarchical reinforcement learning with a multilevel graph representation and a multi-head attention mechanism, which enables connected autonomous vehicles (CAVs) to exhibit capabilities akin to human cognition, facilitating consistent decision-making across various critical dimensions. The proposed decision-making strategy is validated in challenging environments characterized by high density, randomness, and dynamism on highway roads. We assess the performance of our framework through ablation studies, comparative analyses, and spatiotemporal trajectory evaluations. This study presents a quantitative analysis of decision-making mechanisms mirroring human cognitive functions in the realm of heterogeneous mixed autonomy, promoting the development of multi-dimensional decision-making strategies and a sophisticated distribution of attentional resources.

Data-driven Construction of Finite Abstractions for Interconnected Systems: A Compositional Approach

分类： 系统与控制, 多代理系统, 系统与控制

作者： Daniel Ajeleye, Majid Zamani

发布时间： 2024-08-16

链接： http://arxiv.org/abs/2408.08497v1

摘要： Finite-state abstractions (a.k.a. symbolic models) present a promising avenue for the formal verification and synthesis of controllers in continuous-space control systems. These abstractions provide simplified models that capture the fundamental behaviors of the original systems. However, the creation of such abstractions typically relies on the availability of precise knowledge concerning system dynamics, which might not be available in many real-world applications. In this work, we introduce an innovative, data-driven, and compositional approach to generate finite abstractions for interconnected systems that consist of discrete-time control subsystems with unknown dynamics. These subsystems interact through an unknown static interconnection map. Our methodology for abstracting the interconnected system involves constructing abstractions for individual subsystems and incorporating an abstraction of the interconnection map.

用于学习具有群体感知函数逼近的平均场游戏的随机半梯度下降

分类： 机器学习, 计算机科学与博弈论, 多代理系统, 优化与控制

作者： Chenyu Zhang, Xu Chen, Xuan Di

发布时间： 2024-08-15

链接： http://arxiv.org/abs/2408.08192v1

摘要： 平均场博弈 (MFG) 使用群体分布对大量多智能体系统内的交互进行建模。 MFG 的传统学习方法基于定点迭代 (FPI)，它分别按顺序计算最佳响应和诱导群体分布。然而，由于前向-后向过程引起的振荡，FPI 类型的方法效率低下且不稳定。本文考虑了一种 MFG 的在线学习方法，其中代理同时且完全异步地更新其策略和群体估计，从而产生一种称为 SemiSGD 的简单随机梯度下降 (SGD) 类型方法。 SemiSGD不仅表现出数值稳定性和效率，而且通过将价值函数和总体分布视为统一参数，提供了一种新颖的视角。我们从理论上证明，SemiSGD 将这个统一参数沿着下降方向引导至平均场平衡。受此观点的启发，我们为价值函数和群体分布开发了一种线性函数近似（LFA），从而产生了第一个针对连续状态-动作空间上的 MFG 的群体感知 LFA。为配备群体感知 LFA 的 SemiSGD 提供有限时间收敛和近似误差分析。

EmBARDiment：提高 XR 生产力的嵌入式 AI 代理

分类： 人机交互, 多代理系统

作者： Riccardo Bovo, Steven Abreu, Karan Ahuja, Eric J Gonzalez, Li-Te Cheng, Mar Gonzalez-Franco

发布时间： 2024-08-15

链接： http://arxiv.org/abs/2408.08158v1

摘要： 运行由大型语言模型 (LLM) 支持的聊天机器人的 XR 设备作为始终在线的代理具有巨大的潜力，可以实现更好的生产力场景。然而，基于屏幕的聊天机器人并没有利用 XR 中提供的全套自然输入，包括向内的传感器数据，而是过度依赖明确的语音或文本提示，有时与丢弃的多模式数据配对作为查询的一部分。我们提出了一种利用注意力框架的解决方案，该框架从 XR 环境中的用户操作、眼睛注视和上下文记忆中隐式导出上下文。这最大限度地减少了对设计明确提示的需求，促进了基础和直观的交互，为聊天机器人收集用户见解。我们的用户研究证明了我们简化 XR 与聊天机器人的用户交互的方法的迫在眉睫的可行性和变革潜力，同时为未来 XR 体现的 LLM 代理的设计提供了见解。

马尔可夫潜力博弈的独立政策镜像下降：扩展到大量玩家

分类： 机器学习, 计算机科学与博弈论, 多代理系统

作者： Pragnya Alatur, Anas Barakat, Niao He

发布时间： 2024-08-15

链接： http://arxiv.org/abs/2408.08075v1

摘要： 马尔可夫势博弈（MPG）是马尔可夫博弈的一个重要子类，是建模多智能体强化学习问题的通用框架。特别是，MPG 包括作为特殊情况的相同兴趣设置，其中所有代理共享相同的奖励函数。将纳什均衡学习算法的性能扩展到大量代理对于多代理系统至关重要。为了应对这一重要挑战，我们专注于独立学习设置，其中代理只能访问本地信息来更新自己的策略。在之前的 MPG 工作中，获得 $\epsilon$-Nash 后悔的迭代复杂度与代理数量 $N$ 呈线性关系。在这项工作中，我们研究了 MPG 的独立策略镜像下降 (PMD) 算法的迭代复杂性。我们表明，具有 KL 正则化的 PMD（也称为自然策略梯度）对代理数量具有更好的 $\sqrt{N}$ 依赖性，比具有欧几里德正则化和先前工作的 PMD 有所改进。此外，迭代复杂度也与代理动作空间的大小无关。

独立机器人代理的按时间顺序的临时资源共享

分类： 机器人技术, 多代理系统

作者： Arjo Chakravarty, Michael X. Grey, M. A. Viraj J. Muthugala, Mohan Rajesh Elara

发布时间： 2024-08-15

链接： http://arxiv.org/abs/2408.07942v1

摘要： 资源共享是多机器人系统的重要组成部分。我们提出了一种基于布尔可满足性的资源共享方法。我们的主要贡献是一种将任何约束分配转换为基于加权 SAT 的优化的算法。我们提出了一个定理，允许通过重复应用 SAT 求解器来解决最优资源分配问题。此外，我们展示了一种使用联合范式（CNF）对连续时间排序约束进行编码的方法。我们对新算法进行了基准测试，并表明它们可以在临时环境中使用。我们在一组模拟和现实世界的机器人上测试我们的算法，并表明这些算法能够处理现实世界的情况。我们的算法和测试工具是开源的，并基于 Open-RMF 车队管理系统构建。

Nah Bandit：在推荐系统中对用户不合规进行建模

分类： 机器学习, 信息检索, 多代理系统, 系统与控制, 系统与控制

作者： Tianyue Zhou, Jung-Hoon Cho, Cathy Wu

发布时间： 2024-08-15

链接： http://arxiv.org/abs/2408.07897v1

摘要： 推荐系统现在遍及数字世界，从广告到娱乐。然而，在现实世界（例如移动性或健康领域）实施有效的推荐系统仍然具有挑战性。这项工作重点关注一个关键挑战：在现实世界中，如果用户不喜欢任何建议，那么用户通常很容易选择不接受任何建议，并回到她的基线行为。因此，在网络物理推荐系统中，使用了解此类用户行为的交互模型进行操作至关重要，以免用户完全放弃推荐。因此，本文介绍了 Nah Bandit，这是一个描述 Bandit 问题的半开玩笑参考，用户可以对推荐说“不”并选择他们喜欢的选项。因此，这个问题介于典型的老虎机设置和监督学习之间。我们通过参数化推荐对用户的锚定效应来对用户不遵守情况进行建模。然后，我们提出了专家聚类（EWC）算法，这是一种分层方法，结合了推荐和非推荐选项的反馈，以加速用户偏好学习。在具有 $N$ 用户、每个用户 $T$ 轮次和 $K$ 集群的推荐场景中，EWC 实现了 $O(N\sqrt{T\log K} + NT)$ 的后悔界限，实现了卓越的理论性能短期内与 LinUCB 算法相比。实验结果还强调，EWC 的性能优于监督学习和传统的上下文老虎机方法。这一进展表明，有效利用不合规反馈可以加速偏好学习并提高推荐准确性。这项工作为 Nah Bandit 的未来研究奠定了基础，为更有效的推荐系统提供了一个强大的框架。

SigmaRL：用于运动规划的样本高效且可泛化的多智能体强化学习框架

分类： 机器人技术, 机器学习, 多代理系统, 系统与控制, 系统与控制

作者： Jianye Xu, Pan Hu, Bassam Alrifaee

发布时间： 2024-08-14

链接： http://arxiv.org/abs/2408.07644v1

摘要： 本文介绍了一种名为 SigmaRL 的开源、去中心化框架，旨在提高用于联网和自动车辆运动规划的多智能体强化学习 (RL) 的样本效率和泛化能力。大多数强化学习智能体的泛化能力有限，通常只关注特定场景，并且通常在训练期间看到的类似甚至相同的场景中进行评估。人们提出了各种方法来应对这些挑战，包括经验重播和正则化。然而，强化学习中的观察设计如何影响样本效率和泛化仍然是一个尚未探索的领域。我们通过提出五种设计信息密集观测的策略来解决这一差距，重点关注适用于大多数交通场景的一般特征。我们在十字路口使用这些策略训练 RL 代理，并通过在完全不可见的交通场景（包括新十字路口、入口匝道和环岛）的数值实验来评估其泛化能力。结合这些信息密集的观察结果，可以将单个 CPU 上的训练时间缩短到不到一小时，并且评估结果表明我们的 RL 代理可以有效地进行零样本泛化。代码：github.com/cas-lab-munich/SigmaRL

基于嵌套图强化学习的生态队列决策策略

分类： 多代理系统, 机器学习

作者： Xin Gao, Xueyuan Li, Hao Liu, Ao Li, Zhaoyang Ma, Zirui Li

发布时间： 2024-08-14

链接： http://arxiv.org/abs/2408.07578v1

摘要： 队列行驶技术以其精确的车辆控制、交通流优化和能源效率提高而闻名。然而，在大规模混合车队中，车辆的异构性和不可预测的交通状况导致了虚拟的瓶颈。这些瓶颈导致车队内的交通吞吐量减少和能源消耗增加。为了应对这些挑战，我们引入了基于嵌套图强化学习的决策策略。该策略改善了协作决策，确保能源效率并缓解拥堵。我们提出了一种嵌套交通图表示理论，该理论映射了非欧几里得空间中车辆和车队之间的动态交互。通过将时空加权图纳入多头注意力机制，我们进一步增强了模型处理本地和全局数据的能力。此外，我们还开发了嵌套图强化学习框架来增强队列的自我迭代学习能力。利用I-24数据集，我们设计并进行了对比算法实验、泛化性测试和渗透性消融实验，从而验证了所提出策略的有效性。与基线相比，我们的策略将吞吐量提高了 10%，并将能源消耗降低了 9%。具体而言，提高 CAV 的普及率可显着提高交通吞吐量，但也会增加能源消耗。

通过协作多代理系统中基于动态有向图的通信桥接训练和执行

分类： 多代理系统

作者： Zhuohui Zhang, Bin He, Bin Cheng, Gang Li

发布时间： 2024-08-14

链接： http://arxiv.org/abs/2408.07397v1

摘要： 多智能体系统必须学会通信并理解智能体之间的交互，以在部分观察到的任务中实现合作目标。然而，现有的方法缺乏动态的定向沟通机制，并且依赖于全球国家，从而削弱了沟通在集中训练中的作用。因此，我们提出了基于变压器的图粗化网络（TGCNet），这是一种新颖的多智能体强化学习（MARL）算法。 TGCNet 学习动态有向图的拓扑结构来表示通信策略，并集成图粗化网络以在训练期间近似表示全局状态。它还利用转换器解码器在执行期间进行特征提取。与流行的 MARL 算法相比，多个协作 MARL 基准测试的实验证明了最先进的性能。进一步的消融研究验证了我们的动态有向图通信机制和图粗化网络的有效性。

通过统一动作空间改进物理异构多智能体强化学习中的全局参数共享

分类： 多代理系统, 人工智能

作者： Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han

发布时间： 2024-08-14

链接： http://arxiv.org/abs/2408.07395v1

摘要： 在多智能体系统（MAS）中，动作语义指示智能体的动作对其他实体的不同影响，并且可用于将智能体划分为物理异构MAS中的组。以前的多智能体强化学习（MARL）算法在不同类型的异构智能体之间应用全局参数共享，而无需仔细区分不同的动作语义。这种共同的实现降低了复杂情况下代理之间的合作和协调。然而，完全独立的代理参数极大地增加了计算成本和训练难度。为了从不同动作语义的使用中受益，同时保持适当的参数共享结构，我们引入了统一动作空间（UAS）来满足要求。 UAS是具有不同语义的所有代理动作的并集。所有代理首先计算其在 UAS 中的统一表示，然后使用不同的可用操作掩码生成其异构操作策略。为了进一步改进额外 UAS 参数的训练，我们引入了跨组逆（CGI）损失来利用轨迹信息预测其他组的代理策略。作为解决物理异构MARL问题的通用方法，我们将UAS添加到基于值和基于策略的MARL算法中，并提出了两种实用算法：U-QMIX和U-MAPPO。 SMAC 环境中的实验结果证明了 U-QMIX 和 U-MAPPO 与几种最先进的 MARL 方法相比的有效性。

使用生成流网络的多智能体连续控制

分类： 人工智能, 多代理系统

作者： Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu

发布时间： 2024-08-13

链接： http://arxiv.org/abs/2408.06920v1

摘要： 生成流网络（GFlowNets）旨在从轨迹的最终状态与奖励成正比的分布中生成不同的轨迹，作为探索性控制任务的强化学习的强大替代方案。然而，GFlowNet 中的单个流匹配约束限制了它们在多智能体系统中的应用，特别是连续联合控制问题。在本文中，我们提出了一种新颖的多智能体生成连续流网络（MACFN）方法，使多个智能体能够对各种组合连续对象进行协作探索。从技术上讲，MACFN 以集中式基于全局流的匹配方式训练去中心化的基于个体流的策略。在集中训练过程中，MACFN 引入了连续流分解网络，以在仅存在全局奖励的情况下推导出每个代理的流贡献。然后，代理可以仅根据分配的本地流量以去中心化的方式执行操作，形成与奖励成比例的联合策略分配。为了保证连续流分解的表现力，我们从理论上推导了分解网络上的一致性条件。实验结果表明，所提出的方法产生的结果优于最先进的同行和更好的探索能力。我们的代码可在 https://github.com/isluoshuang/MACFN 获取。

QTypeMix：通过异构和同质价值分解增强多智能体协作策略

分类： 多代理系统, 人工智能, I.2.6; I.2.11

作者： Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan

发布时间： 2024-08-12

链接： http://arxiv.org/abs/2408.07098v1

摘要： 在多智能体协作任务中，异构智能体的存在是很常见的。与同质智能体之间的合作相比，协作需要考虑每个智能体最适合的子任务。然而，多智能体系统的运行往往涉及大量复杂的交互信息，使得学习异构策略更具挑战性。相关的多智能体强化学习方法有时使用分组机制来形成更小的合作群体或利用先前的领域知识来学习不同角色的策略。相反，智能体应该在不依赖额外信息的情况下学习更深入的角色特征。因此，我们提出了QTypeMix，它将价值分解过程分为同构和异构阶段。 QTypeMix 学习通过 TE 损失从本地历史观察中提取类型特征。此外，我们引入了包含注意力机制和超网的先进网络结构，以增强表示能力并实现价值分解过程。在 SMAC 和 SMACv2 的 14 个地图上测试所提出的方法的结果表明，QTypeMix 在不同难度的任务中实现了最先进的性能。

基于图神经网络内在动机的异构多智能体强化学习中的去中心化合作

分类： 多代理系统, 人工智能, 机器人技术, I.2.6; I.2.9; I.2.11

作者： Jahir Sadik Monon, Deeparghya Dutta Barua, Md. Mosaddek Khan

发布时间： 2024-08-12

链接： http://arxiv.org/abs/2408.06503v1

摘要： 多智能体强化学习（MARL）正在成为各种顺序决策和控制任务的关键框架。与单代理系统不同，多代理系统需要代理之间的成功合作。在现实场景中部署这些系统通常需要分散的训练、多样化的代理以及从不常见的环境奖励信号中学习。在部分可观察性和缺乏有关主体异质性的先验知识的情况下，这些挑战变得更加明显。虽然著名的研究使用内在动机（IM）来解决分散环境中的奖励稀疏或合作问题，但那些处理异质性的研究通常假设集中训练、参数共享和代理索引。为了克服这些限制，我们提出了 CoHet 算法，该算法利用一种新颖的基于内在动机的图神经网络（GNN），在部分可观察性和奖励稀疏性的挑战下，促进去中心化环境中异构代理策略的学习。在多智能体粒子环境 (MPE) 和矢量化多智能体模拟器 (VMAS) 基准中对 CoHet 的评估表明，与一系列协作多智能体场景中的最新技术相比，CoHet 具有卓越的性能。我们的研究通过分析智能体动力学模型对内在动机模块的影响、深入了解不同 CoHet 变体的性能及其对越来越多的异构智能体的鲁棒性进行了补充。

自主分散学习制造系统基于状态的势博弈中的分布式 Stackelberg 策略

分类： 计算机科学与博弈论, 人工智能, 机器学习, 多代理系统

作者： Steve Yuwono, Dorothea Schwung, Andreas Schwung

发布时间： 2024-08-12

链接： http://arxiv.org/abs/2408.06397v1

摘要： 本文描述了一种新颖的博弈结构，用于自主优化具有多目标优化挑战的分散制造系统，即基于状态的潜在博弈中的分布式 Stackelberg 策略 (DS2-SbPG)。 DS2-SbPG集成了潜在博弈和Stackelberg博弈，提高了潜在博弈的合作权衡能力和Stackelberg博弈的多目标优化处理能力。值得注意的是，所有培训程序仍然以完全分布式的方式进行。 DS2-SbPG 提供了一种很有前途的解决方案，通过消除在自学习领域中为个体参与者设置组合目标优化函数的复杂性，特别是在子系统之间具有多样化和众多目标的现实工业环境中，找到目标之间的最佳权衡。 -系统。我们进一步证明 DS2-SbPG 构成了一个动态势博弈，导致相应的收敛保证。在实验室规模的测试台上进行的实验验证突出了 DS2-SbPG 及其两种变体的功效，例如用于单领导者-跟随者的 DS2-SbPG 和针对多领导者-跟随者的 Stack DS2-SbPG。结果显示功耗显着降低，整体性能提高，这表明 DS2-SbPG 在实际应用中的潜力。

基于量子退火的 LEO 卫星高效联盟形成算法

分类： 量子物理学, 计算复杂度, 离散数学, 多代理系统

作者： Supreeth Mysore Venkatesh, Antonio Macaluso, Marlon Nuske, Matthias Klusch, Andreas Dengel

发布时间： 2024-08-12

链接： http://arxiv.org/abs/2408.06007v1

摘要： 在制造和发射成本降低的推动下，近地轨道 (LEO) 卫星数量不断增加，这对于地球观测任务和低延迟互联网连接具有不可估量的价值。然而，随着卫星数量的增加，需要维护的通信链路数量也随之增加，使得这个庞大网络的管理变得越来越具有挑战性，并凸显了将卫星聚集成高效组作为一种有前景的解决方案的必要性。本文将 LEO 卫星的聚类表述为联盟结构生成 (CSG) 问题，并利用量子退火来解决它。我们将卫星网络表示为图，并使用称为 GCS-Q 的混合量子经典算法获得最佳分区。该算法遵循自上而下的方法，使用二次无约束二元优化 (QUBO) 公式在每一步迭代分割图。为了评估我们的方法，我们利用来自 Celestrak 的 Starlink 卫星的真实三线元素集 (TLE/3LE) 数据。我们使用 D-Wave Advantage 退火器和最先进的求解器 Gurobi 进行的实验表明，量子退火器在运行时间方面显着优于经典方法，同时保持了解决方案的质量。量子退火器实现的性能超越了经典计算机的能力，凸显了量子计算在优化大规模卫星网络管理方面的变革潜力。

优化汽车行业 PDF 聊天机器人的 RAG 技术：本地部署 Ollama 模型的案例研究

分类： 信息检索, 人工智能, 多代理系统

作者： Fei Liu, Zejun Kang, Xing Han

发布时间： 2024-08-12

链接： http://arxiv.org/abs/2408.05933v1

摘要： 随着汽车工业生产环境中对离线 PDF 聊天机器人的需求不断增长，在本地低性能设置中优化大型语言模型 (LLM) 的部署变得越来越重要。本研究的重点是增强检索增强生成 (RAG) 技术，使用本地部署的 Ollama 模型处理复杂的汽车行业文档。基于Langchain框架，我们提出了Ollama本地RAG实现的多维优化方法。我们的方法解决了汽车文档处理中的关键挑战，包括多列布局和技术规范。我们针对汽车行业文档的独特特征引入了 PDF 处理、检索机制和上下文压缩方面的改进。此外，我们还根据 LangGraph 最佳实践设计了支持嵌入管道的自定义类和支持 self-RAG 的代理。为了评估我们的方法，我们构建了一个专有数据集，其中包含典型的汽车行业文档，包括技术报告和公司法规。我们将优化的 RAG 模型和自 RAG 代理与三个数据集的原始 RAG 基线进行了比较：我们的汽车行业数据集、QReCC 和 CoQA。结果表明，上下文精确度、上下文回忆、答案相关性和忠实度均得到显着改善，在汽车行业数据集上的表现尤其显着。我们的优化方案为在汽车行业部署本地 RAG 系统提供了有效的解决方案，满足工业生产环境中 PDF 聊天机器人的特定需求。该研究对于推进汽车行业的信息处理和智能生产具有重要意义。

通过动态拓扑图上的 Voronoi 分区进行快速且通信高效的多无人机探索

分类： 机器人技术, 多代理系统

作者： Qianli Dong, Haobo Xi, Shiyong Zhang, Qingchen Bi, Tianyi Li, Ziyu Wang, Xuebo Zhang

发布时间： 2024-08-11

链接： http://arxiv.org/abs/2408.05808v1

摘要： 高效的数据传输和合理的任务分配对于提高多机器人探索效率具有重要意义。然而，大多数通信数据类型通常包含冗余信息，因此需要大量通信量。此外，以探索为导向的任务分配绝非易事，对于资源有限的无人机（UAV）来说变得更具挑战性。在本文中，我们提出了一种快速且通信高效的多无人机探索方法，用于探索大型环境。我们首先设计一个多机器人动态拓扑图（MR-DTG），由代表已探索和探索区域的节点以及连接节点的边组成。在 MR-DTG 的支持下，我们的方法只需传输勘探规划所需的必要信息即可实现高效通信。为了进一步提高探索效率，利用MR-DTG设计了分层多无人机探索方法。具体来说，考虑实际运动成本，使用\emph{图Voronoi分区}将MR-DTG的节点分配给最近的无人机，从而实现合理的任务分配。据我们所知，这是第一个使用 \emph{graph Voronoi 分区} 来解决多无人机探索问题的工作。所提出的方法与模拟中最先进的方法进行了比较。结果表明，该方法能够分别减少探索时间和通信量高达 38.3% 和 95.5%。最后，我们的方法的有效性在 6 架无人机的实际实验中得到了验证。我们将发布源代码以造福社区。

强盗低语者：不安分强盗的沟通学习

分类： 机器学习, 多代理系统

作者： Yunfan Zhao, Tonghan Wang, Dheeraj Nagaraj, Aparna Taneja, Milind Tambe

发布时间： 2024-08-11

链接： http://arxiv.org/abs/2408.05686v1

摘要： 将强化学习 (RL) 应用于不息多臂老虎机 (RMAB) 为解决资源约束和时间动态的分配问题提供了一条有前途的途径。然而，经典的 RMAB 模型在很大程度上忽视了（系统性）数据错误的挑战——由于不同的数据收集协议和差异隐私的故意噪声等因素，这种错误在现实场景中很常见。我们证明，用于训练 RMAB 的传统强化学习算法在这种情况下很难表现良好。为了解决这个问题，我们提出了 RMAB 中的第一个通信学习方法，我们研究哪些手臂在参与通信时最有效地减轻此类系统数据错误的影响。在我们的设置中，手臂从类似手臂接收 Q 函数参数作为消息，以指导行为策略、引导 Q 函数更新。我们通过考虑所有手臂对之间消息的联合效用并使用分解联合效用的 Q 网络架构来学习通信策略。理论和经验证据都验证了我们的方法在显着提高 RMAB 解决各种问题的性能方面的有效性。

分层多臂强盗并发智能辅导不同难度的概念和问题

分类： 计算机与社会, 人工智能, 人机交互, 机器学习, 多代理系统

作者： Blake Castleman, Uzay Macar, Ansaf Salleb-Aouissi

发布时间： 2024-08-10

链接： http://arxiv.org/abs/2408.07208v1

摘要： 远程教育在二十世纪蓬勃发展，催生了智能辅导系统。特别是，研究发现多臂老虎机（MAB）智能导师在遍历探索与利用权衡景观以提供学生问题建议方面具有显着的能力。然而，先前的文献严重缺乏开源 MAB 智能导师，这阻碍了这些教育 MAB 推荐系统的潜在应用。在本文中，我们将有关 MAB 智能辅导技术的最新文献结合到一种开源且易于部署的分层 MAB 算法中，该算法能够通过概念和问题同时促进学生进步，确定理想的推荐问题难度，并评估潜在记忆衰退。我们使用 500 名学生组成的模拟小组来评估我们的算法，利用贝叶斯知识追踪来估计学生的内容掌握情况。结果表明，我们的算法在与难度无关时，可以显着提高学生的成功率，并且进一步添加问题难度适应功能可以显着提高该指标。

通过大规模动态生态驾驶减少大都市碳排放

分类： 系统与控制, 人工智能, 机器学习, 多代理系统, 机器人技术, 系统与控制

作者： Vindula Jayawardana, Baptiste Freydt, Ao Qu, Cameron Hickert, Edgar Sanchez, Catherine Tang, Mark Taylor, Blaine Leonard, Cathy Wu

发布时间： 2024-08-10

链接： http://arxiv.org/abs/2408.05609v1

摘要： 交通运输的庞大规模和多样性使其成为脱碳的强大部门。在这里，我们考虑一个减少碳排放的新机会：半自动驾驶汽车的日益普及，可以通过编程来通过智能速度命令来减少走走停停的交通，从而减少排放。但这种动态的生态驾驶会推动气候变化吗？由于交通场景的多样性和车辆排放的复杂性，全面的影响分析一直无法实现。我们通过大规模场景建模工作以及使用多任务深度强化学习和精心设计的网络分解策略来应对这一挑战。我们对美国三大大城市的 6,011 个信号交叉口的动态生态驾驶进行了深入的前瞻性影响评估，模拟了 100 万个交通场景。总体而言，我们发现针对排放进行优化的车辆轨迹可以将全市交叉口的碳排放量减少 11-22%，而不会损害吞吐量或安全性，并且在合理的假设下，分别相当于以色列和尼日利亚的国家排放量。我们发现，10% 的环保驾驶采用率可带来总减排量的 25%-50%，近 70% 的收益来自 20% 的交叉路口，这表明了近期的实施路径。然而，这个高影响力的交叉口子集的组成在不同的采用水平上差异很大，重叠最小，需要对生态驾驶部署进行仔细的战略规划。此外，与车辆电气化和混合动力汽车采用的预测结合起来考虑时，生态驾驶的影响仍然很大。更广泛地说，这项工作为大规模分析交通外部性（例如时间、安全性和空气质量）以及解决方案策略的潜在影响铺平了道路。

促进合作的进化机制可能不会促进社会福利

分类： 动力系统, 人工智能, 计算机科学与博弈论, 多代理系统, 适应和自组织系统

作者： The Anh Han, Manh Hong Duong, Matjaz Perc

发布时间： 2024-08-09

链接： http://arxiv.org/abs/2408.05373v1

摘要： 了解自利个体中亲社会行为的出现是许多科学学科的一个重要问题。人们提出了各种机制来解释此类行为的演变，主要是寻找给定机制可以引发最高水平合作的条件。由于这些机制通常涉及改变个人收益的成本，然而，以最高水平的合作为目标可能会损害社会福利——后者广义地定义为总人口收益，考虑到诱导增加亲社会性所涉及的所有成本。行为。在这里，通过比较分析从两种成熟的亲社会行为机制（即同伴激励和制度激励）的随机进化模型获得的社会福利和合作水平，我们准确地证明了这一点。我们表明，最大化合作水平的目标和最大化社会福利的目标常常不一致。我们认为，在设计和实施社会和集体物品的进化机制时，需要将社会福利作为主要优化目标。

博弈的表演预测和机制设计

分类： 机器学习, 计算机科学与博弈论, 多代理系统

作者： António Góis, Mehrnaz Mofakhami, Fernando P. Santos, Simon Lacoste-Julien, Gauthier Gidel

发布时间： 2024-08-09

链接： http://arxiv.org/abs/2408.05146v1

摘要： 预测常常会影响他们想要预测的现实，这种效应被称为表现性。现有的工作重点是这种影响下的准确性最大化，但模型部署可能会产生重要的意想不到的影响，特别是在多智能体场景中。在这项工作中，我们研究了具体博弈论环境中的表演预测，其中社会福利是准确性最大化的替代目标。我们探索了一种集体风险困境场景，在预测集体行为时，最大化准确性会对社会福利产生负面影响。通过假设了解贝叶斯代理行为模型，我们将展示如何实现更好的权衡并将其用于机制设计。

基于集线器的群的性能预测

分类： 多代理系统, 人工智能, 机器学习

作者： Puneet Jain, Chaitanya Dwivedi, Vigynesh Bhatt, Nick Smith, Michael A Goodrich

发布时间： 2024-08-09

链接： http://arxiv.org/abs/2408.04822v1

摘要： 基于中心的群体由多个代理组成，这些代理共享一个称为中心的公共巢穴。特工在远离中心的地方执行任务，例如寻找食物或收集有关未来筑巢地点的信息。对基于中心的集群进行建模具有挑战性，因为集体状态空间的大小随着智能体数量的增长而迅速增长。本文提出了一种基于图的群落表示，它可以与基于图的编码器相结合，创建集体状态的低维表示，可以扩展到许多代理，以解决 N 个群落问题中的最佳问题。我们通过两个实验演示了如何使用低维嵌入中的信息。首先，我们展示如何通过为一个非常小的问题选择最佳位置的概率，使用张量中的信息来聚类集体状态。其次，我们展示了当使用图编码器来学习低维嵌入时，结构化的集体轨迹是如何出现的，并且这些轨迹具有可用于预测群体性能的信息。

直线加速器相干光源下仪器操作的多尺度认知交互模型

分类： 人机交互, 多代理系统, 高能物理-实验

作者： Jonathan Segal, Wan-Lin Hu, Paul Fuoss, Frank E. Ritter, Jeff Shrager

发布时间： 2024-08-08

链接： http://arxiv.org/abs/2408.04734v1

摘要： 我们描述了直线加速器相干光源（LCLS）仪器操作的新型多智能体、多尺度计算认知交互模型。 LCLS 是领先的科学用户设施，是世界上第一个硬 X 射线自由电子激光器，由美国能源部 SLAC 国家加速器实验室运营。作为世界上第一个 X 射线自由电子激光器，LCLS 的需求量很大，而且超额认购。我们的整个项目采用认知工程方法，通过完善实验界面和工作流程、简化任务、减少错误以及提高操作员安全和压力水平来提高实验效率和科学生产力。我们的模型在多个认知和时间尺度（从几秒到几小时）以及扮演多种角色（包括仪器操作员、实时数据分析师和实验经理）的代理之间模拟人类认知的各个方面。该模型可以预测对操作界面和工作流程提出的更改所产生的影响。由于模型代码是开源的，并且补充视频详细介绍了模型和结果的各个方面，因此该方法可以应用于其他实验设备和过程。示例结果证明了该模型在指导修改以提高运营效率和科学产出方面的潜力。我们讨论了我们的研究结果对复杂实验环境中认知工程的影响，并概述了未来的研究方向。

在间接互惠的混合动机博弈中学习公平合作

分类： 多代理系统, 计算机科学与博弈论, 93A16

作者： Martin Smit, Fernando P. Santos

发布时间： 2024-08-08

链接： http://arxiv.org/abs/2408.04549v1

摘要： 利他合作虽然成本高昂，但却是社会所希望的。因此，智能体很难通过独立的强化学习（RL）来学习合作策略。间接互惠，即代理人考虑其互动伙伴的声誉，已被证明可以稳定同质、理想化群体中的合作。然而，更现实的环境由具有不同特征和基于群体的社会身份的异质主体组成。我们研究当代理被分为两个这样的组时的合作，并允许声誉更新和行动取决于组信息。我们考虑两种建模方法：进化博弈论，我们全面搜索导致合作和公平的社会规范（即分配声誉的规则）； RL，我们考虑政策学习的随机动态如何影响分析确定的均衡。我们观察到，大多数人的背叛会导致少数群体的背叛，但反之则不然。此外，改变判断群体内外互动的规范可以引导系统走向公平或不公平的合作。当超越均衡分析转向独立的强化学习智能体时，这一点变得更加清晰，在独立的强化学习智能体中，收敛到公平合作是通过一组更窄的规范来实现的。我们的结果强调，在具有声誉的异质人群中，仔细定义互动规范对于解决合作和公平的困境至关重要。

多代理系统的出现：安全视角

分类： 多代理系统

作者： Philipp Altmann, Julian Schönberger, Steffen Illium, Maximilian Zorn, Fabian Ritz, Tom Haider, Simon Burton, Thomas Gabor

发布时间： 2024-08-08

链接： http://arxiv.org/abs/2408.04514v1

摘要： 在执行分散且依赖于本地信息的多代理系统 (MAS) 中可能会出现紧急效应。这些影响的范围可能从行为上的微小偏差到灾难性的系统故障。为了正式定义这些影响，我们确定了全局固有规范（真实规范）与其局部近似值（例如不同奖励成分或观察的配置）之间的偏差。使用既定的安全术语，我们开发了一个框架来理解这些紧急影响。为了展示由此产生的影响，我们使用两个可广泛配置的示例网格世界场景，其中不充分的规范会导致独立导出时出现意外的行为偏差。认识到全局适应可能并不总是可行，我们建议调整底层参数化以缓解这些问题，从而改善系统的一致性并降低紧急故障的风险。

多智能体近端策略优化中的部分奖励解耦分配信用

分类： 多代理系统, 人工智能, 机器学习, 机器人技术

作者： Aditya Kapoor, Benjamin Freed, Howie Choset, Jeff Schneider

发布时间： 2024-08-08

链接： http://arxiv.org/abs/2408.04295v1

摘要： 多智能体近端策略优化（MAPPO）最近在具有挑战性的多智能体强化学习任务上展示了最先进的性能。然而，MAPPO 仍然在解决信用分配问题，其中将信用归因于个体代理行为的绝对困难与团队规模的大小关系不大。在本文中，我们提出了一种多智能体强化学习算法，该算法适应学分分配的最新发展，以改进 MAPPO。我们的方法利用部分奖励解耦（PRD），它使用学习注意力机制来估计特定智能体的哪些队友与其学习更新相关。我们使用此估计将大型代理组动态分解为更小、更易于管理的子组。我们凭经验证明，我们的方法 PRD-MAPPO 将代理与不影响其预期未来奖励的队友分离，从而简化了信用分配。我们还表明，与 MAPPO 和其他最先进的方法相比，PRD-MAPPO 在多个多智能体任务（包括《星际争霸 II》）中产生了显着更高的数据效率和渐近性能。最后，我们提出了一个适用于 \textit{shared} 奖励设置的 PRD-MAPPO 版本，其中 PRD 以前不适用，并且经验表明这也导致了 MAPPO 的性能改进。

多智能体强化学习的异步信用分配框架

分类： 多代理系统

作者： Yongheng Liang, Hejun Wu, Haitao Wang, Hao Cai

发布时间： 2024-08-07

链接： http://arxiv.org/abs/2408.03692v1

摘要： 信用分配是区分智能体边际贡献以优化多智能体强化学习（MARL）合作策略的核心问题。当前的信用分配方法通常假设代理之间同步决策。然而，许多现实合作任务的先决条件是代理异步决策，而不是等待其他人以避免灾难性后果。为了解决这个问题，我们提出了一种异步信用分配框架，其中包含名为 ADEX-POMDP 的问题模型和乘法值分解（MVD）算法。 ADEX-POMDP 是一种异步问题模型，具有额外的虚拟代理，用于分散的部分可观察马尔可夫决策过程。我们证明 ADEX-POMDP 保持了任务平衡和算法收敛。 MVD 利用乘法交互来有效捕获异步决策的交互，我们从理论上证明了它在处理异步任务方面的优势。实验结果表明，在 Overcooked 和 POAC 两个异步决策基准上，MVD 不仅始终优于最先进的 MARL 方法，而且还提供了异步协作的可解释性。

结合不同信息进行协调行动：异构智能体的随机强盗算法

分类： 多代理系统, 人工智能, 机器学习

作者： Lucia Gordon, Esther Rolf, Milind Tambe

发布时间： 2024-08-06

链接： http://arxiv.org/abs/2408.03405v1

摘要： 随机多智能体多臂老虎机通常假设每个臂的奖励遵循固定分布，无论哪个智能体拉动该臂。然而，在许多现实环境中，奖励可能取决于每个智能体对其环境的敏感度。在医学筛查中，疾病检出率可能因测试类型而异；在偏好匹配中，奖励可以取决于用户偏好；在环境传感中，不同传感器的观测质量可能会有所不同。由于过去的工作没有指定如何在随机强盗设置中分配异构但已知敏感度的代理，因此我们引入了一种 UCB 式算法 Min-Width，它聚合来自不同代理的信息。在此过程中，我们解决了以下共同挑战：（i）聚合奖励，每个代理臂对遵循不同的分布，以及（ii）协调代理到臂的分配。最小宽度促进异构代理之间的有效协作，利用代理奖励函数中的已知结构来相应地加权其奖励。我们分析了Min-Width的遗憾，并进行了伪合成和全合成实验来研究不同级别信息共享的性能。我们的结果证实，当代理之间的敏感性差异更大时，建模代理异质性的收益往往会更大，而组合更多信息并不总是能提高性能。

异构图注意力网络改善癌症多组学整合

分类： 机器学习, 多代理系统, 生物分子, 基因组学

作者： Sina Tabakhi, Charlotte Vandermeulen, Ian Sudbery, Haiping Lu

发布时间： 2024-08-05

链接： http://arxiv.org/abs/2408.02845v1

摘要： 高维多组学数据的增加需要先进的集成模型来捕捉人类疾病的复杂性。基于图的深度学习集成模型尽管前景广阔，但仍难以应对小患者群体和高维特征，通常应用独立的特征选择，而不对组学之间的关系进行建模。此外，传统的基于图的组学模型侧重于同质图，缺乏多种类型的节点和边来捕获不同的结构。我们引入了用于组学集成的异构图注意力网络 (HeteroGATomics)，以改善癌症诊断。 HeteroGATomics 通过多代理系统执行联合特征选择，为每个组学模态创建特征和患者相似性的专用网络。然后，这些网络被组合成一个异构图，用于学习整体组学特定的表示并整合跨模态的预测。对三个癌症多组学数据集的实验证明了 HeteroGATomics 在癌症诊断方面的卓越性能。此外，HeteroGATomics 通过识别有助于诊断结果的重要生物标志物来增强可解释性。

评估集装箱装卸策略对提高货运吞吐量的影响

分类： 多代理系统, 计算机与社会

作者： Sarita Rattanakunuprakarn, Mingzhou Jin, Mustafa Can Camur, Xueping Li

发布时间： 2024-08-05

链接： http://arxiv.org/abs/2408.02768v1

摘要： 随着全球供应链和货运量的增长，美国面临着不断上升的运输需求。对公路运输的严重依赖，加上铁路系统的利用不足，导致高速公路拥堵、运输时间延长、成本上升和碳排放增加。加州圣佩德罗港综合体 (SPPC) 是全美最繁忙的港口，面临着这些挑战的很大一部分。我们利用基于代理的模拟来复制现实世界的场景，重点关注 SPPC 修改后的多式联运入境货运系统中复杂的交互。这涉及将集装箱分类重新转移到加利福尼亚州、犹他州、亚利桑那州和内华达州的潜在仓库，而不是仅仅在港口地区。我们的主要目标是评估拟议系统的效率，考虑成本和货运吞吐量，同时还检查劳动力短缺的影响。计算分析表明，在选定的仓库中战略性地安装多式联运功能可以降低运输成本、提高吞吐量并培育资源

ReDel：LLM 驱动的递归多代理系统工具包

分类： 计算和语言, 多代理系统, 软件工程, I.2.7

作者： Andrew Zhu, Liam Dugan, Chris Callison-Burch

发布时间： 2024-08-05

链接： http://arxiv.org/abs/2408.02248v1

摘要： 最近，人们越来越有兴趣使用大型语言模型（LLM）构建复杂的多智能体系统来执行诸如编译文献评论、起草消费者报告和计划假期等任务。存在许多工具和库来帮助创建此类系统，但是没有一个支持递归多代理系统——模型本身灵活地决定何时委派任务以及如何组织其委派结构。在这项工作中，我们介绍了 ReDel：一个用于递归多代理系统的工具包，支持自定义工具使用、委托方案、基于事件的日志记录以及在易于使用的 Web 界面中的交互式重播。我们表明，使用 ReDel，我们能够在代理基准上实现显着的性能提升，并通过可视化和调试工具轻松识别潜在的改进领域。我们的代码、文档和 PyPI 包都是开源的，可以在 MIT 许可下免费使用。

序列社会困境中的环境复杂性和纳什均衡

分类： 计算机科学与博弈论, 人工智能, 多代理系统

作者： Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks

发布时间： 2024-08-04

链接： http://arxiv.org/abs/2408.02148v1

摘要： 多智能体强化学习（MARL）方法虽然在零和或正和博弈中有效，但在一般和博弈中通常会产生次优结果，在博弈中合作对于实现全局最优结果至关重要。矩阵博弈社会困境抽象了一般和交互的关键方面，例如合作、风险和信任，无法对现实世界场景的时空动态特征进行建模。作为回应，我们的研究将矩阵游戏社交困境扩展到更复杂、更高维度的 MARL 环境中。我们采用了猎鹿困境的网格世界实现，以更紧密地匹配一次性矩阵游戏的决策空间，同时引入了可变的环境复杂性。我们的研究结果表明，随着复杂性的增加，在这些环境中训练的 MARL 智能体会收敛到次优策略，这与矩阵博弈中发现的风险主导纳什均衡策略一致。我们的工作强调了环境复杂性对在高维博弈论 MARL 环境中实现最佳结果的影响。

基于价值的原理改善社交体验：多主体模拟研究

分类： 多代理系统, 人工智能, 机器学习

作者： Sz-Ting Tzeng, Nirav Ajmeri, Munindar P. Singh

发布时间： 2024-08-04

链接： http://arxiv.org/abs/2408.02117v1

摘要： 我们提出了 Exanna，一个框架来实现将价值观纳入决策的代理。 Exannaagent 在为其行为提供理由并评估他人提供的理由时会考虑自身和他人的价值观。通过多智能体模拟，我们证明，考虑决策中的价值观和产生理由，特别是对于偏离规范的行为，可以带来（1）更高的冲突解决能力，（2）更好的社交体验，（3）更高的隐私性，以及（4）更高的安全性。灵活性。

MALADE：编排由 LLM 驱动的代理，并具有检索增强生成功能以实现药物警戒

分类： 计算和语言, 人工智能, 信息检索, 机器学习, 多代理系统, 定量方法

作者： Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page

发布时间： 2024-08-03

链接： http://arxiv.org/abs/2408.01869v1

摘要： 在大型语言模型 (LLM) 时代，鉴于其卓越的文本理解和生成能力，有一个前所未有的机会来开发基于 LLM 的新方法，以进行值得信赖的医学知识合成、提取和总结。本文重点关注药物警戒（PhV）问题，其重要性和挑战在于从不同的文本来源（例如医学文献、临床记录和药物标签）中识别药物不良事件（ADE）。不幸的是，这项任务受到多种因素的阻碍，包括药物和结果术语的差异，以及 ADE 描述经常被埋藏在大量叙述文本中。我们推出了 MALADE，这是第一个由大语言模型提供支持的有效协作多智能体系统，具有检索增强生成功能，用于从药物标签数据中提取 ADE。该技术涉及使用从文本资源中提取的相关信息来增强对 LLM 的查询，并指示 LLM 撰写与增强数据一致的响应。 MALADE 是一个与 LLM 无关的通用架构，其独特的功能是：（1）利用各种外部资源，例如医学文献、药物标签和 FDA 工具（例如 OpenFDA 药物信息 API），（2）提取药物- 以结构化格式的结果关联以及关联的强度，以及 (3) 为已建立的关联提供解释。 MALADE 使用 GPT-4 Turbo 或 GPT-4o 以及 FDA 药物标签数据进行实例化，根据 ADE 的 OMOP Ground Truth 表证明了其 ROC 曲线下面积为 0.90 的功效。我们的实现利用了 Langroid 多代理 LLM 框架，可以在 https://github.com/jihyechoi77/malade 找到。

具有基于集合的置信度的意见动态：收敛标准和周期性解决方案

分类： 多代理系统, 系统与控制, 系统与控制, 动力系统, 优化与控制, 适应和自组织系统

作者： Iryna Zabarianska, Anton V. Proskurnikov

发布时间： 2024-08-03

链接： http://arxiv.org/abs/2408.01753v1

摘要： 本文介绍了 Hegselmann-Krause (HK) 意见动态模型的新多维扩展，其中意见接近度不是由范数或度量决定的。相反，每个智能体都信任 Minkowski 总和 $\xi+\mathcal{O}$ 中的意见，其中 $\xi$ 是智能体的当前意见，$\mathcal{O}$ 是定义可接受偏差的置信集。在每次迭代期间，代理通过同时对可信意见进行平均来更新他们的意见。与传统的 HK 系统不同，$\mathcal{O}$ 在某种规范中是一个球，我们的模型允许置信集是非凸的，甚至是无界的。我们证明，称为 SCOD（基于集合的置信度动态）的新模型可以表现出传统 HK 模型所没有的属性。一些解可能收敛到状态空间中的非平衡点，而另一些解则周期性振荡。如果集合 $\mathcal{O}$ 是对称的并且其内部包含零，这些“病态”就会消失：与通常的 HK 模型类似，SCOD 然后在有限次数的迭代中收敛到平衡点之一。如果一个代理人“顽固”并拒绝改变自己的观点，但仍然影响其他人，则后一种属性也将得到保留；然而，两个顽固的因素可能会导致振荡。

社交模拟代理中的自我情感混合对话生成

分类： 多代理系统, 人工智能, 计算和语言, 计算机与社会, I.2.7; I.2; I.6

作者： Qiang Zhang, Jason Naradowsky, Yusuke Miyao

发布时间： 2024-08-03

链接： http://arxiv.org/abs/2408.01633v1

摘要： 当进行对话时，虚拟模拟环境中的对话代理可能会表现出与直接对话上下文无关的自己的情绪状态，这种现象称为自我情绪。本研究探讨了这种自我情绪如何影响智能体在大型语言模型（LLM）驱动的模拟框架内的对话策略和决策中的行为。在对话策略预测实验中，我们分析了有或没有自我情绪的智能体所采用的对话策略选择，并将它们与人类的对话策略选择进行了比较。结果表明，融入自我情感有助于智能体表现出更像人类的对话策略。在一项独立实验中，比较了在 GPT-4 生成的对话数据集上微调的模型的性能，我们证明自我情感可以带来更好的整体自然性和人性。最后，在智能体就多个主题进行讨论的虚拟模拟环境中，我们发现智能体的自我情绪可以显着影响智能体的决策过程，导致决策发生大约 50% 的变化。

用于生成患者友好的医疗报告的代理大语言模型工作流程

分类： 多代理系统

作者： Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih

发布时间： 2024-08-02

链接： http://arxiv.org/abs/2408.01112v2

摘要： 大型语言模型 (LLM) 在医疗保健领域的应用正在迅速扩展，其中一个潜在的用例是将正式的医疗报告翻译成患者可读的等效内容。目前，LLM 的输出通常需要人工编辑和评估，以确保事实准确性和可理解性，上述用例也是如此。我们的目标是通过提出带有反射框架的代理工作流程来最小化这一步骤，该框架使用迭代自我反射来纠正大语言模型的输出。该流程经过测试，并与 16 份随机放射学报告的零样本提示进行了比较。在我们的多代理方法中，在查看 ICD-10 代码验证时，报告的准确率为 94.94%，而零样本提示报告的准确率为 68.23%。此外，81.25% 的最终反映报告不需要对准确性或可读性进行更正，而只有 25% 的零样本提示报告满足这些标准，无需修改。这些结果表明，我们的方法提供了一种可行的方法，可以以快速、有效和连贯的方式向患者传达临床发现，同时保持医疗准确性。该代码库可在 http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation 上查看。

CommonUppRoad：自动驾驶车辆的正式建模、验证、学习和可视化框架

分类： 多代理系统, 机器人技术

作者： Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen

发布时间： 2024-08-02

链接： http://arxiv.org/abs/2408.01093v1

摘要： 机器学习和形式化方法（FM）的结合为克服自动驾驶（AD）车辆的安全问题提供了可能的解决方案。然而，在这种组合变得实际适用和有用之前，还需要弥补一些差距。为了方便 FM 和 AD 领域的研究人员，本文提出了一个结合了两个众所周知的工具（即 CommonRoad 和 UPPAAL）的框架。一方面，CommonRoad可以通过UPPAAL中模型严格的语义得到增强，从而能够系统、全面地理解AD系统的行为，从而增强系统的安全性。另一方面，UPPAAL合成的控制器可以通过CommonRoad在现实世界的道路网络中可视化，这有助于自动驾驶车辆设计者在系统设计中大大采用形式化模型。在此框架中，我们提供 CommonRoad 和 UPPAAL 之间的自动模型转换。因此，用户只需要使用Python进行编程，框架就会在后端负责形式化模型、学习和验证。我们进行实验来证明我们的框架在各种 AD 场景中的适用性，讨论在我们的框架中解决运动规划的优势，并展示可扩展性限制和可能的解决方案。

非线性效用的多目标公共物品博弈中的学习

分类： 多代理系统, 人工智能, 计算机科学与博弈论

作者： Nicole Orzan, Erman Acar, Davide Grossi, Patrick Mannion, Roxana Rădulescu

发布时间： 2024-08-01

链接： http://arxiv.org/abs/2408.00682v1

摘要： 解决如何在风险和不确定性下实现最佳决策的问题对于增强与人类协作或支持人类的人工智能体的能力至关重要。在这项工作中，我们在公共物品游戏的背景下解决这个问题。我们通过多目标强化学习的方式研究公共物品博弈的新颖多目标版本中的学习，其中主体具有不同的风险偏好。我们引入参数化非线性效用函数来模拟个体代理人层面的风险偏好，而不是游戏的集体和个人奖励部分。我们研究了这种偏好模型和环境不确定性在博弈中激励调整水平上的相互作用。我们展示了个人偏好和环境不确定性的不同组合如何在非合作环境（即竞争策略占主导地位）中维持合作模式的出现，而其他组合如何在合作环境（即合作策略占主导地位）中维持竞争模式。

孔多塞弃权陪审团定理

分类： 计算机科学与博弈论, 多代理系统

作者： Ganesh Ghalme, Reshef Meir

发布时间： 2024-08-01

链接： http://arxiv.org/abs/2408.00317v1

摘要： 著名的孔多塞陪审团定理认为，当人口规模增加到无穷大时，多数规则会以概率 1 在两个可用选项中选择最佳选择。我们在不对称的两位候选人设置下研究了这一结果，其中两位候选人的支持者可能有不同的参与成本。当弃权的决定是完全理性的时，即当投票关键性是平局的概率时，唯一的均衡结果是微不足道的均衡，其中除了投票成本为零的选民之外的所有选民都投了弃权票。我们提出并分析了一种更实用、有界理性的模型，在该模型中，选民高估了自己的关键性，并表明，在该模型下，出现了非平凡均衡，即两位候选人的获胜概率都远离一个。我们表明，当枢轴性估计强烈依赖于胜利幅度时，无论人口规模如何，并且与孔多塞的主张相反，在任何非平凡均衡中，任何候选人都无法确保胜利。而在弱依赖边际的情况下，孔多塞陪审团定理得以恢复。